Speed is crucial in defending against advanced attackers. Advanced attackers have a proven ability to make significant headway in a victim’s environment before the defender’s response workflow even initiates a response. This sort of time delay often relegates the investigator’s role to that of a historian — documenting the attacker’s success rather than defending against the compromise. Too often in our (Mandiant’s) investigations of victims who learned of their breach through a 3rd party, we find security alerts and event logs that reveal warning signs of the intrusion – warning signs whose significance was not initially understood.
Recognizing the early stages of a breach is difficult, particularly when the analyst (often a Tier-1 resource working an evening, weekend, or holiday shift) receives little more than a single alarm and has to answer the question, “Do I escalate?” Every analyst knows that they lose credibility if they consistently escalate false-positives or non-issues. But failing to properly identify and escalate a significant alarm could have dire consequences.
Increasing the certainty and confidence of an escalation decision requires pairing the initial security event or alarm with forensic information from the associated endpoint. We call this “triage.” Just as emergency medical response teams determine a patient’s priority based on the severity of his or her condition, a Tier 1 Cybersecurity Analyst needs to do the same with the events and alarms presented from an increasing number of sources. As with medical triage, time matters.
In order to mitigate damage, it’s important to collect forensics from an endpoint as close as possible to the time the endpoint triggers an alarm. Fortunately, with the right tools, methodology, and configuration, this process can be automated, meaning that by the time an analyst is alerted, he or she is also presented with relevant data necessary to make the escalation decision.
Analysts must carefully design the technical mechanics of collecting the necessary forensic artifacts from a suspect endpoint, so that investigative demands on the end-point and on the network do not negatively impact the business. Tier 1 analysts must be confident that their decisions to request supporting data from end points as part of an alarm response won’t disrupt service. They must also ensure that triage forensic investigations yield complete information, so that the data returned in response to a specific alert will support a decisive decision.
Consider the simple example of a Tier-1 analyst receiving an anti-virus alert indicating the presence of restricted/malicious software on an endpoint; perhaps PWDump in this reference. PWDump is often used for malicious purposes, but can be used as part of a valid security exercise to test password strength – so this alert by itself may not provide enough evidence to signal an active compromise. The Tier-1 analyst should act on the principle that if you find malicious software on an end-point, assume that it has been used and investigate the conditions in which the restricted software was installed and used.
The triage step in this instance may include a preplanned audit of the host in question to provide 1) a history of recently executed programs, 2) changes to the file system, registry, or event logs near the onset of the alert, 3) a review of currently running processes and open network connections to confirm that malicious software has be run (and when it was run), and 4) collecting the system information from the endpoint providing the user name it ran under. Upon review of this information, the analyst can determine the escalation path. This information collection and review should only take minutes and should occur as close to the time of initial alert as possible, allowing the analyst to detect, confirm, and escalate an active breach.
In this example, the severity of the alarm could only be truly determined through examination of the relevant host. And the escalation decision may not be trivial if, say, the Tier-1 analyst is alerted on a holiday shift such that involving senior staff requires calling people in. Triage provides the justification for initiating a responsive workflow while collecting relevant artifacts as close to the time of the alert as possible to ensure the preservation of volatile evidence. Organizations that automate this triage step benefit by the fact that the human-time in the response workflow starts after the collection of the relevant data to support a decision, saving precious time.
About the guest contributor: Tim Gifford is the Federal Account Manager for Mandiant and a regular blogger on M-Unition.