Article Details
Scrape Timestamp (UTC): 2024-08-07 00:23:40.915
Source: https://www.theregister.com/2024/08/07/crowdstrike_full_incident_root_cause_analysis/
Original Article Text
Click to Toggle View
CrowdStrike hires outside security firms to review Falcon code. And reveals the small mistake that bricked 8.5 million Windows boxes. CrowdStrike has hired two outside security firms to review the Falcon sensor code that sparked a global IT outage last month – but it may not have an awful lot to find, because CrowdStrike has identified the simple mistake that caused the incident. News of the review emerged in a root causes analysis [PDF] published on Tuesday. As we learned from CrowdStrike's first post incident review of the flawed code – which bricked millions of Windows machines worldwide – the problem began back in February. That was when the security vendor added a sensor to Falcon to help the software detect novel attack techniques that abuse named pipes and other Windows interprocess communication (IPC) mechanisms. The update went through the usual development and testing, and then CrowdStrike pushed a new "Template Type" including the IPC-related info to its Falcon sensors in a "Channel File" numbered 291. That file was updated periodically with updates that added new "Template Instances" to improve Falcon's attack-detection prowess. Info in Template Instances is processed by a "Content Interpreter" when Falcon swoops into action. The root causes analysis provided a deeper look at what went wrong next: The new IPC Template Type defined 21 input parameter fields, but the integration code that invoked the Content Interpreter with Channel File 291's Template Instances supplied only 20 input values to match against. This parameter count mismatch evaded multiple layers of build validation and testing, as it was not discovered during the sensor release testing process, the Template Type (using a test Template Instance) stress testing or the first several successful deployments of IPC Template Instances in the field. In part, this was due to the use of wildcard matching criteria for the 21st input during testing and in the initial IPC Template Instances. Then, as CrowdStrike also previously explained, two further IPC-related Template Instances were automatically deployed to Falcon users on July 19. One of these used a non-wildcard matching criterion for the 21st input. This resulted in a new version of the Channel File that required Falcon sensors to inspect the 21 inputs – but another piece of software called the Content Interpreter expected only 20 values. "Therefore, the attempt to access the 21st value produced an out-of-bounds memory read beyond the end of the input data array and resulted in a system crash," the security shop explained in the root cause analysis. CrowdStrike has coded a fix to ensure that mismatches of the number of inputs validated versus number of actual inputs doesn't happen again. It's a patch for the Sensor Content Compiler – this is the function that validates the number of inputs provided by the template type – and it went into production July 27. CrowdStrike also wrote that it has added runtime input array bounds checks to the Content Interpreter for Rapid Response updates, to ensure the size of the input array matches the number of expected inputs. These fixes are currently being backported to all Windows sensor versions 7.11 and above with a sensor software hotfix. The release will be generally available by August 9. Additionally, the chastened security vendor is doing more tests – including some that test non-wildcard matching criteria for each field across all template types, and new checks to ensure that flawed files aren't pushed to Falcon customers in the future. Further, as CrowdStrike had noted in its earlier analysis, every Template Instance will henceforth be deployed to customers in a staged rollout, rather than being pushed to all users all at once. It's worth noting that the company is being sued by investors for not originally using this type of phased approach in sending updates to customers. "Looking ahead, CrowdStrike is focused on using the lessons learned from this incident to better serve our customers," a spokesperson declared. "CrowdStrike remains steadfast in our mission to protect customers and stop breaches." But not so steadfast that it’s naming the partners it hired to review its code. Those reviews have commenced, and are focused on the code and processes that led to the July 19 fiasco. "We are not providing information on the vendors who are doing work for us beyond what is referenced in the RCA," the CrowdStrike spokesperson told The Register.
Daily Brief Summary
CrowdStrike has engaged external security firms to audit the Falcon sensor code after a coding error caused a major global IT outage.
The fault, traced back to a February update, led to incorrect implementation in the IPC Template, causing system crashes when mismatched input fields were processed.
The coding mishap passed unnoticed through various levels of testing, resulting in the deployment of the defective update to Falcon users, which eventually bricked 8.5 million Windows machines.
CrowdStrike has developed a fix to prevent input validation mismatches and added runtime input array bounds checks to its Content Interpreter.
These amendments and fixes are being integrated back into all compatible Windows sensor versions and will be generally available soon.
Additionally, the security provider is enhancing its update deployment processes, including staged rollouts and enhanced validation checks to prevent similar incidents.
Despite the efforts to correct the flaws, details about the third-party firms conducting the code review remain undisclosed.