ITS Service Status Report

Service Degradation — Resolved

Container Service

Start Time: 05/19/2018 8:00 pm

Maintenance Completed: 05/21/2018 11:13 am

Application workloads will be moved to accommodate this maintenance. Hosted applications may see downtime while they transition from one server to another.

Issue Symptoms: Degradation

Technical Details

Service Type: Production

Server Name: os-node4

Comments:
5/21: 11:41: We were able to boot from an ISO that had updated firmware, and applied the update that way. The server is back online with updated firmware, has been tested and added back to the cluster.


5/20: 8:20pm: server is not booting correctly when power cycled. At this point we are assuming it is unsalvageable and will move forward with plans to replace it.


5/20, 3:15: Updating hardware drivers has not resolved issue. Essentially the network card is not being recognized by the updated firmware. We have removed the server experiencing the issue from bluecat to mitigate issues with monitoring and users being incorrectly sent to an unavailable server. We will meet later this evening to determine our next steps.


5/20, 10am update: One OpenShift server is currently unavailable. Production-stage applications are running on the available server.
QA-stage applications have been spun down while we continue to work to restore the other server.

This server would not boot operating system after applying first firmware upgrade. After consulting with ITS Systems Support, we will upgrade the hardware drivers on this server at 1pm this afternoon. This should be completed by mid-afternoon. We anticipate having the issue resolved at that time.

QA-stage applications will be spun up after completing maintenance.

==== original announcement ====
To alleviate recent problems with our OpenShift cluster, we will be updating the firmware on our OpenShift on-premise cluster. We will also change the underlying storage of one of the core system components (elasticsearch).

Groups Notified:

  • container-notification

Report Additional Impacts