Service Disruption - Load Balancer Configuration

Postmortem

Incident Report: Service Disruption - Load Balancer Configuration
Date/Time of Incident:

  • 12/15/2024, 11:06 PM: Host computer component overheated and caused the host connection disk array to degrade.
    • Both the Primary and the Secondary Load balancer in the high availability pair were running on this affected host.
  • 12/15/2024, 11:18 PM: Host computer rebooted.
    • All Load balanced applications were temporarily unable to receive connections.

Resolution:

  • 12/15/2024, 11:18 PM: Host balancing policies began migrating/restoring effected virtual machines to other non-effected hosts.
  • 12/15/2024, 11:28 PM: Virtual machines migrated/restarted and running on other hosts.
  • 12/15/2024, 11:40 PM: Gateway data processing caught up.

Actions Taken:

  1. Reconfigure VM balancing to enforce Load balancers. (12/16/2024):
    • Configure host balancing to prevent load balancers from being assigned to the same host simultaneously.
    • Adjust storage policies so that each load balancer disks reside of separate storage arrays to remove additional single points of failure.
  2. Offload network processing to CPU’s to alleviate load on network cards. (12/16/2024):
    • Resolve overheating of 10g network cards.
  3. Enhanced Monitoring (12/17/2024):
    • Reviewing for immediate acquisition automated monitoring options for host accessories for early notification of changes in load or processing.
  4. Enhanced network card selection (12/17/2024):
    • Purchase of upgraded networking to better manage current and future loads.
Resolved
Problem Identified

This issue was opened retrospectively.

6 Affected Services: