ITS Service Status Report

Scheduled Maintenance — Resolved

Great Lakes Summer Maintenance

Services Affected: Research HPC

Start Time: 08/08/2022 7:00 am

Maintenance Completed: 08/10/2022 8:00 pm

Issue Symptoms: Outage

We are performing our biannual maintenance on Great Lakes, Armis2, and Lighthouse. This year will be a significant maintenance (will be a routine comprehensive update):
* Major update from CentOS 7.9 Red Hat 8.4
* Update Great Lakes scratch filesystem from GPFS v4 to GPFS v5.  This migration may take 3 days to complete.
* Networking updates to InfiniBand. Possibly to ethernet but seems unlikely, due to hardware not arriving.
* Upgrading our workload manager Slurm to 21.08.8-2

Users will not be able to use the clusters during the maintenance.

Who is Impacted? ARC Great Lakes Researchers

Next Update: August 8th

Technical Details

Service Type: Production

Server Name: Great Lakes

Comments:

2022-08-08 07:53 AM : Great Lakes quiesced.  Generating backups prior to starting node updates. Beginning InfiniBand and GPFS system updates.

2022-08-09 06:11 AM :Great Lakes update going well. Most nodes up.  We are resolving some networking issues and working on completing the updates to the /scratch filesystem.

2022-08-10 06:00 PM: Great Lakes Scratch issues remain.  We are working with the vendor to remediate remaining issues.   OS and software loads are complete. 

2022-08-10 08:01 PM : Great Lakes has been returned to service.  IME is still unavailable ;we will send another email to the Great Lakes users when that is working as expected.

Report Additional Impacts

Contact the ITS Service Center for more information or to report additional impacts.