Incident Report: Service Disruption - Load Balancer Configuration
Date/Time of Incident:
- 12/15/2024, 11:06 PM: Host computer component overheated and caused the host connection disk array to degrade.
- Both the Primary and the Secondary Load balancer in the high availability pair were running on this affected host.
- 12/15/2024, 11:18 PM: Host computer rebooted.
- All Load balanced applications were temporarily unable to receive connections.
Resolution:
- 12/15/2024, 11:18 PM: Host balancing policies began migrating/restoring effected virtual machines to other non-effected hosts.
- 12/15/2024, 11:28 PM: Virtual machines migrated/restarted and running on other hosts.
- 12/15/2024, 11:40 PM: Gateway data processing caught up.
Actions Taken:
- Reconfigure VM balancing to enforce Load balancers. (12/16/2024):
- Configure host balancing to prevent load balancers from being assigned to the same host simultaneously.
- Adjust storage policies so that each load balancer disks reside of separate storage arrays to remove additional single points of failure.
- Offload network processing to CPU’s to alleviate load on network cards. (12/16/2024):
- Resolve overheating of 10g network cards.
- Enhanced Monitoring (12/17/2024):
- Reviewing for immediate acquisition automated monitoring options for host accessories for early notification of changes in load or processing.
- Enhanced network card selection (12/17/2024):
- Purchase of upgraded networking to better manage current and future loads.