Intrusion detection scanners can miss up to 90% of threats. In fact, in the January 2016 GAO(1) report, a recent test of the Department of Homeland Security’s (DHS) National Cybersecurity Protection System (NCPS) system found that for five major client applications, the NCPS detection system provided only partial coverage for a meager 4 of the 439 known vulnerabilities selected for review. These numbers are not encouraging. The GAO was more emphatic in stating that signature based detection for known threats are simply not adequate to address threats that can exploit common vulnerabilities. We can no longer rely on signatures—they just don’t work to detect unknown threats. The best way to do this is with machine learning, a key component of artificial intelligence (AI).
A Brief History of AI
The idea of artificial intelligence goes back to ancient civilization with myths, stories and rumors of artificial beings endowed with intelligence or consciousness. In modern history, the term artificial intelligence was coined at a conference at Dartmouth College in 1956, which led to the development of The Logic Theory Machine (1956) and the General Problem Solver (1957) software. But AI really took off when Marvin Minsky and his colleague John McCarthy started the MIT Artificial Intelligence lab in 1958. This is where the programming language LISP was created, which is still an important language for AI. The golden years for AI were from 1956–1974. Initially, researchers believed a fully intelligent machine capable of passing the Turing test would be built within twenty years, and DARPA was just one of the agencies pouring money into the AI lab.
However, an intelligent machine was nowhere on the horizon, and researchers didn’t appreciate the difficulty of what they were up against. Funding suddenly dried up and AI became a dirty word. The golden years of AI were over and the first AI winter occurred from 1974–1980. But AI never really went away. It just went underground.
A resurgence happened from 1980–1984. The Japanese got involved in AI, and with them came the “boom years” The Japanese applied AI to everything. Fuzzy logic chips ensured that rice was cooked perfectly and that subway trains ran smoothly and on time without operators. But in the US, nobody talked about AI. Instead, it was all about expert systems. Companies like Digital Equipment led the knowledge revolution, building expert systems that could answer questions or solve problems in specific domains of knowledge.
The Japanese Ministry of Trade poured millions into AI and DARPA was back in the game, funding the Strategic Computing Initiative. This was the age of neural networks and speech and character recognition. Only a few diehards talked about true AI and passing the Turing test. Once again, funding dried up and the second AI winter began in 1987, lasting until 1993. But as before, research on AI never stopped; people just stopped talking about it.
Are We There Yet?
Today, AI is sexy again. Thanks to Moore’s Law, compute power has increased exponentially. And network bandwidth and storage have followed suit, doubling 18-24 months as well. Combined with the rise of big data analytics, AI is now being used in everything, from robotics to retail. Machine learning (ML), a major component of AI, is transforming our world. According the SAS institute (2) machine learning is defined as:
“…a method of data analysis that automates analytical model building. Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look.” (2)
ML is ubiquitous. Doctors use ML to improve medical image analysis. AirBnB is a huge proponent of ML, using it every day to recommend the perfect properties for your next getaway. And where would Netflix be without ML? Every time you browse, Netflix is using ML to sort through your past preferences and those of your friends to predict which movie you should watch next. But the most amazing application of ML is that Google is using it teach cars to drive. The future has arrived!
Machine Learning Applied to Cybersecurity
If machine learning is so successful in medicine, retail, and entertainment, then why are we not using it to solve one of the greatest problems facing our nation today—cybersecurity?
Some cybersecurity experts believe that machines simply can’t replace what they do. Humans must make the decisions—they simply don’t trust a computer to do it. Some people think that AI is science fiction. It’s too good to be true, or just a pipe dream, not something worth wasting precious time on. Maybe it’s OK for marketing, but with cybersecurity, we’re talking about people’s lives.
And then there are those who don’t understand what machine learning is. They confuse it with Asimov’s I, Robot or Skynet—fully sentient machines, when in reality, ML is just a set of computer applied statistical algorithms; nothing fancier than what Amazon does with one-click shopping. As Arthur C. Clarke’s third law states: “any sufficiently advanced technology is indistinguishable from magic.”—Cybersecurity professionals don’t have time for magic.
Machine learning has actually been around for a long time, centuries before Google and Amazon. Most machine-learning algorithms are based on traditional statistics, a discipline that began in 1662 when John Gault first calculated census demographic data. Most of our basic statistical formulas were developed in the mid 1700s. By 1950, we had created the discipline of modern statistics. This also coincided with the beginning of machine computing. Computers helped statisticians perform calculations that were difficult using a slide rule.
As for the field of machine learning, many of the primary predictive statistics like linear and logistic regression were invented in the 1920s. Clustering and more advanced predictive statistical algorithms like ID C4.5, k-nearest neighbors, k-means and naïve-bayes, were all developed in the 1960s in the first golden age of AI. While there have been some new algorithms, such as Random Forest ™ in 1995, most progress now comes from novel applications of these older and trusted algorithms. Interestingly, some of the latest work at the NSA in Bayesian networks, is all based on Bayes’ theorem, articulated by Thomas Bayes (1702–1761).
There are three main types (3) of machine learning: supervised, unsupervised, and reinforcement. Some people classify semi-supervised as a fourth type. Different algorithms are used in each type of machine learning. But basically, it has to do with how much input a human gives to the process, and how much the computer does without additional input at each stage or type of dataset. Different types of data require different processes and different algorithms.
So if machine learning and their statistical algorithms have been around for so long, why are we just now getting around to using them? Marvin Minsky had ambitious plans for artificial intelligence. The problem was that he was ahead of his time, and didn’t have the necessary compute power. Just like full text indexing—it’s been around for decades, but only in the last few years has enough cheap compute power made it feasible. The same is true for voice recognition, which was widely showcased at the MIT Media Lab in the 1980s.Thanks to Moore’s Law and new distributed technology from Google, we now have an abundance of cheap compute power. This is what has made ML a reality.
Machine Learning: A New Hope for Cybersecurity?
A year ago, industry analysts suddenly became bullish on the idea of using machine learning for cybersecurity. Yet even they were still predicting it would take another five years to be viable.
So has that happened yet? Has anyone applied ML to cybersecurity? And if so, does it work? The answer is yes. But I should clarify something. I am not talking about using advanced AI machines like Watson that can beat any human at Jeopardy, or Deep Blue that can beat Kasparov at chess.
Let’s be clear. I am not suggesting that we build a scanner that can read network traffic at line speed and learn in real-time to block malicious code. That’s not how machine learning works. What I am suggesting is something very different, and there are companies already doing it today. Here’s how it works.
If you give a computer a simple picture of an apple and orange, it can’t tell the difference. But with machine learning you can teach it to distinguish between them. You give the computer a specific algorithm and keep showing it pictures. With enough trials, it can learn to tell the difference with amazing accuracy.
For cybersecurity, you start off by getting a big sample of malware, and a large sample of good code. You apply machine learning algorithms and slowly teach the computer how to recognize malware from good code. As the computer goes through its iterations, it gets better and better, until finally it can produce a set of classifiers. These classifiers tell the computer what to look for. The computer program is now ready to look at streams of incoming data in real-time and say, “Yes, that’s good code” or “No, that code has a 98% probability of being bad code or malicious.” The computer can’t ever really know, of course. But it can tell you the probability of accurately identifying the bad or malicious code.
Now imagine putting that machine on a network and scanning data in real-time. Remember that the computer is now just doing pattern matching at very high speed, something computers are really good at. When the computer finds something with a high probability of being suspicious you have several options.
- Automatically route it to Cisco’s Threat Intelligence Services.
- Create workflow to notify an analyst immediately.
- Automatically router code to a post analyzer to test it in a sandbox to verify that it is indeed malware.
The key here is not to take the human experts out of the equation, but instead to let computers do what they do best—process massive amounts of information quickly, so analysts can do their job better. ML makes computers smarter by enhancing the computer’s capacity to recognize patterns. When the computer finds something, it will simply present a probability. A human still needs to make a decision about what to do next.
Time, ML, and Hunting Malware
Today, most malware is used to get in the front door. Cybersecurity experts say that malware infiltrating your network usually goes undiscovered from 100-250 days. That gives cybercriminals lots of time to escalate privileges and exfiltrate valuable data. By using machine learning to enhance malware detection, we break the kill chain. We turn the tables on threat actors by taking away the advantage of time. With time on our side, we can determine if the code is malicious and if it is, hunt down the attacker.
As machine learning advances in cybersecurity, we will see it applied to many things: better identification of malware in end-point security systems, web firewalls, and IPSs, all in real-time. Someday we may even be able to track down where cyber criminals are hiding in the network, by analyzing network traffic. But while we are waiting, machine learning is already turning the tables on cybercriminals by detecting malware early in the cycle and breaking the kill chain.
In my next article, I will review a cutting edge technology that is using ML today to thwart cyber attacks.
- (2016) Information Security, GAO
- http://www.sas.com/en_id/insights/analytics/machine-learning.html, 2016
- Thompson, W. Bucheli, H. (2014) Statistics and Machine Learning at Scale. SAS, 2
Nathaniel Rushfinn Crocker is the CIO and co-founder of the Crocker Institute, a benefit corporation in South Carolina dedicated to helping organizations meet their mission through Enterprise Architecture and Program Evaluation. He is passionate about cybersecurity and protecting our online identities. Interested in hearing more of what Nate has to share? You can find him on Twitter and connect with him on LinkedIn.