Gateway Service

Load Balancer communication

Resolved

Incident Report: Service Disruption - Network Hardware Fault

Date/Time of Incident:

  • 12/22/2024, 11:06 PM MST: Degraded service reported. Symptoms observed include:
    • Error 404 while accessing the iMonnit website
    • Redirection to sensorcert.com web page
    • No Webhook, Notification, or Rest API services operating.
    • Gateway service unavailable.
  • 12/23/2024, 09:03 PM MST: Degraded service reported as database synchronization propagates.
  • 12/23/2024, 10:10 PM MST: Load balancers assessed and limited services restored.
  • 12/23/2024, 10:45 PM MST: Services reporting intermittent functionality. Administrators
    continuing to evaluate for root cause.
  • 12/23/2024, 11:41 PM MST: Network Interface Card (NIC) diagnosed as faulty. Corrupted hard
    drive identified during outage.
  • 12/24/2024, 12:16 AM MST: All services fully restored.

Actions Taken:

  • Hardware (12/23/2024): Network Interface Card replaced and failed raid controller replaced.
  • Hardware (12/24/2024): Corrupted hard drive replaced.
  • Data Restoration (12/24/2024): All affected data was successfully restored to the database.
  • Load Balancer (12/24/2024): All servers and services operational in redundant clusters.

Planned Actions from Incident:

  • Maintenance Window (01/03/2024)
    • Additional Network Interface Cards be replaced.
    • Potential operating system install on backup server
Monitoring

Found a failed drive preventing consistent operations. We have failed backover to the primarly load balancer and are working to resolve the drive.

Investigating

We are working on regaining access to the load balancer.

Resolved

All services up and working, Working with load balancer manufacturer support team evaluating for a root cause.

Assessed

Our Load Balancer is not able to pass data to the server clusters. Administrators are working on it and have gotten some communication working. But more is required to restore service.