The major global Microsoft outage earlier this week was caused by a patch for the Wide Area Network. As a result, routers could no longer forward packets. As a result, the tooling to monitor the WAN no longer worked, Microsoft writes in a status update.
Microsoft writes in a preliminary post incident review in the Azure status dashboard learn more about a major outage that occurred earlier this week. During that outage, a large part of Microsoft services worldwide, including Office 365, Outlook and Teams, were down and services hosted on Azure were difficult to reach. According to Microsoft, the outage lasted a total of 5 hours and 40 minutes, although many services were partially accessible after two hours.
In an earlier status update, Microsoft said that a problem had arisen after a faulty Wide Area Network patch was implemented, but details were still missing. Now the company will provide those details. Microsoft attempted to modify an IP address on a WAN router, but accidentally issued a command to the router that sent messages to all routers on that network. As a result, those routers adjusted their forwarding tables. In the time that happened, normal packets were hardly forwarded or not at all. As a result, the routers had bad connections in half-hour waves and allowed little traffic through.
According to Microsoft, the command in question had not been properly tested on the specific router where it was executed. The company acknowledges that the order may respond differently to specific hardware and that the relevant quality control process was not properly completed. An additional problem is that the outage also disabled the systems that could monitor the other systems.
Microsoft says it will no longer be able to execute “high-impact commands” directly on devices in the future . That process has now been set up. In addition, the company wants to ensure that all command commands comply with safety guidelines from February.