Article Details

Scrape Timestamp (UTC): 2024-07-31 13:04:18.304

Source: https://www.theregister.com/2024/07/31/microsoft_ddos_azure/

Original Article Text

Click to Toggle View

'Error' in Microsoft's DDoS defenses amplified 8-hour Azure outage. A playbook full of strategies and someone fumbles the implementation. Do you have problems configuring Microsoft's Defender? You might not be alone: Microsoft admitted that whatever it's using for its defensive implementation exacerbated yesterday's Azure instability. No one has blamed the actual product named "Windows Defender," we must note. According to Microsoft, the initial trigger event for yesterday's outage, which took out great swathes of the web, was a distributed denial-of-service (DDoS) attack. Such attacks are hardly unheard of, and an industry has sprung up around warding them off. A DDoS attack aims to overwhelm the resources of the targeted system. It usually involves multiple machines infected with malware flooding the victim with network traffic. Admins employ various methods to differentiate real requests from malicious traffic, but according to F5 Labs, there was still an explosive growth in DDoS attacks in 2023. "Attacks grew so much in fact that, on average, businesses can be expected to deal with a DDoS attack around eleven times a year, almost once a month," the security vendor said. Microsoft has published its strategy to defend against network-based DDoS attacks, noting it was unique due to the global footprint of the company. Microsoft said it was able to "utilize strategies and techniques that are unavailable to most other organizations" thanks to that footprint, as well as draw from the collective knowledge of an extensive threat network. "This intelligence, along with information gathered from online services and Microsoft's global customer base, continuously improves Microsoft's DDoS defense system that protects all of Microsoft online services' assets." This is assuming Microsoft actually implemented that strategy correctly. For yesterday's event, Microsoft's DDoS protection mechanisms were indeed triggered correctly. However, the response did not go so well. "Initial investigations suggest that an error in the implementation of our defenses amplified the impact of the attack rather than mitigating it," the Windows giant admitted last night. The problem was global and affected a subset of customers attempting to connect to services, including Azure App Services, Application Insights, Azure IoT Central, Azure Log Search Alerts, Azure Policy, the Azure portal itself, and a subset of Microsoft 365 and Microsoft Purview services. According to Microsoft the incident lasted from approximately 1145 UTC to 1943 UTC, although the company reckoned the majority of the impact was successfully mitigated by 1410 UTC. The problem wasn't, however, declared over until 2048 UTC. We contacted Microsoft to learn more about the implementation of its DDoS defenses, but the company has yet to respond. A Preliminary Post Incident Review (PIR) is due in approximately 72 hours, and the company will publish a Final PIR in around two weeks.

Daily Brief Summary

DDOS // Microsoft Fault in DDoS Defense Causes Extended Azure Outage

Microsoft's Azure platform experienced an 8-hour outage due to a DDoS attack that was exacerbated by an error in Microsoft's defensive implementation.

The attack was part of a global increase in DDoS attacks, with businesses now facing such disruptors almost monthly.

Microsoft utilizes unique strategies against DDoS attacks owing to its global presence and extensive threat intelligence network.

Despite correct triggering of defense mechanisms, an implementation error led to an amplified rather than mitigated impact during the incident.

The outage affected various services including Azure App Services, Azure IoT Central, and parts of Microsoft 365 and Microsoft Purview.

Microsoft managed to mitigate most of the impact by early afternoon, but the issue wasn’t fully resolved until late evening.

A preliminary post-incident review is expected soon, with a final report to follow in the upcoming weeks to prevent future similar occurrences.