ITS Service Status Report

Unscheduled Outage — Resolved

YBRC Maize Tenant outage.

Services Affected: YBRC Maize service

Start Time: 07/06/2020 8:54 am

Service Restored: 07/08/2020 8:00 am

Issue Symptoms: Outage

We are having an issue with memory DIMMs on the Maize cluster that have affected storage.  Machines in two enclaves are down, Med and School of Ed.  We are working to resolve these issues.  

 

Update: The first rebuild is complete and we are replacing DIMMs at this time. We should be restarting tenants in the next hour or so, and then we'll start bringing machines back up for affected hosts.

Update 5pm Monday:  The rebuild is still ongoing, and we will be working into tomorrow on this issue.  We apologize for the extended outage.

Update 10am Tuesday:  All VMs should be online and the array is still rebuilding.  It should be complete later today.

Update Wednesday:Services is restored, the vSAN is completely rebuilt. 

Who is Impacted? Users of the Secure Enclave Service

Next Update: 7/7 5pm

Technical Details

Service Type: Production

Server Name: maize.ybrc.arc-ts.umich.edu

Comments:

We have had continued DIMM failures on the Maize YB cluster that have affected the SAN.  Currently, two nodes are down which affects some of the VMs on the cluster.  We hope to have things back and running within the next day. 

Groups Notified:

  • ybrc-maize-users

Report Additional Impacts

Contact the ITS Service Center for more information or to report additional impacts.