Incident Report: Service Disruption - Network Hardware Fault
Date/Time of Incident:
- 12/22/2024, 11:06 PM MST: Degraded service reported. Symptoms observed include:
- Error 404 while accessing the iMonnit website
- Redirection to sensorcert.com web page
- No Webhook, Notification, or Rest API services operating.
- Gateway service unavailable.
- 12/23/2024, 09:03 PM MST: Degraded service reported as database synchronization propagates.
- 12/23/2024, 10:10 PM MST: Load balancers assessed and limited services restored.
- 12/23/2024, 10:45 PM MST: Services reporting intermittent functionality. Administrators
continuing to evaluate for root cause.
- 12/23/2024, 11:41 PM MST: Network Interface Card (NIC) diagnosed as faulty. Corrupted hard
drive identified during outage.
- 12/24/2024, 12:16 AM MST: All services fully restored.
Actions Taken:
-
Hardware (12/23/2024): Network Interface Card replaced and failed raid controller replaced.
-
Hardware (12/24/2024): Corrupted hard drive replaced.
-
Data Restoration (12/24/2024): All affected data was successfully restored to the database.
-
Load Balancer (12/24/2024): All servers and services operational in redundant clusters.
Planned Actions from Incident:
-
Maintenance Window (01/03/2024)
- Additional Network Interface Cards be replaced.
- Potential operating system install on backup server