Automated Threat Hunting. Is it AI?

Posted on 2022-09-14 by Pedram Amini

For years we’ve known the game of truly stopping cyber attackers should be to collect every possible piece of data, organize it in a manner that man/machine can assimilate it, analyze it, separate signal from noise, and take corrective action without disrupting business continuity - all before calamity strikes. Let’s assume for a moment that we have this Utopian defense. That means we can lean 100% on prevention - because everything of importance will have been discovered -and a subsequent block, allow, quarantine, etc. rule can be applied. Of course, the world is not Utopian. We don’t have access to 100% of needed data. It is non-trivial to organize data into information - even slowly, let alone at speed. Further, even if we could for a moment, it is far from static. This is where the dream of artificial intelligence always comes to the fore. Let’s talk about that. In the video, I made light of the term ‘artificial intelligence’. So, what was I really trying to convey? First, let’s consider a very basic definition of artificial intelligence (AI) and machine learning (ML). AI is a technology that enables a machine to simulate human behavior - or more specifically, decision-making. ML is a subset of AI - which allows a machine to automatically learn from past data without explicitly programming it to do so. The goal of AI is to make a computer think and act like humans to solve complex problems.

Now, how does this relate to threat hunting? More specifically, is artificial intelligence ready to do the complex work of threat hunting? Let’s consider a set of realities. The diversity of network, application, and data deployments across millions of ‘snowflake’ organizations remains extraordinarily complex, diverse, and implemented to a wide range of standards. Threat hunters - while they may articulate their work similarly at a conceptual level - do not follow a documented, repeatable process in and of themselves, let alone across one another. Each investigation is situational - based on available data, threat hunter hunches, skillset, tools, workload, and more. So to answer the question posed in the blog title, no, automated threat hunting is not AI. For the foreseeable future, one should be highly skeptical of any vendor who claims they have an AI solution in the cybersecurity space - at least one purported to simulate human decision-making with any acceptable level of fidelity.

So where does that leave us? It leaves us with ML. As with many in the security space, we actively employ ML here at InQuest. Now that we’ve made a (hopefully) clear distinction between AI and ML above, let’s delve into what can be reasonably achieved with ML.

It’s covered in greater depth in this whitepaper, but for now let’s rely upon a ‘primer’ of sorts:

  1. Malware data collection methods are becoming increasingly sophisticated
  2. ML is one of many big data techniques for analyzing malware data lakes for discovery of useful patterns and other findings
  3. In order for ML to evaluate data properly, good baseline data models are required
  4. InQuest File Detection and Response (FDR) ‘good data’ is generated by our proprietary Deep File InspectionTM (DFI) technology - a static-analysis engine that inspects well beyond OSI Layer 7 - essentially automating the work of a typical SOC analyst / security researcher
  5. Once DFI post-processed data is gathered, it is passed to four classifiers. Our current ML approach uses supervised and unsupervised ensembles for classifying malware based on pattern recognition and anomaly detection, respectively - three supervised classifiers (logistic regression, random forests, and gradient boosting) and five unsupervised (TLSH, SSDeep, K-means, DBSCAN, and OPTICS).
  6. ML results are unquestionably impressive. That said, they still suffer from a failure to stay relevant against an ever-changing gallery of malicious threats
  7. The problem is exacerbated by adversaries who deliberately work to weaken dataset integrity by inserting “red herring” samples that bias models towards irrelevant criteria

Fortunately for our customers, InQuest Labs analysts intervene in the process of assigning a threat score (we call it IQScore) to each discovered malware artifact. So, while we use ML extensively, end-game actionable intel has been assimilated and vetted by a man+machine approach. You wouldn’t enter a fight against a human-machine opponent (and that is exactly the modern adversary) as only a human, and that is why we fight using the same dual-powered approach.

Let’s bring this home. The outputs from InQuest DFI fuel 80% of the features that drive our machine learning models. ML remains but a single "lens" leveraged by FDR to apply a malicious or sensitive label to a file. This leads to another key FDR differentiator. Not only does it label files as malicious with high fidelity (born of severity, confidence and observed frequency), it also directly informs the analyst / threat hunter circa which malicious files are interesting - and therefore worthy of further human analysis. In an environment with hundreds of thousands of files, this is the automation that ultimately focuses precious human intellect where it is most valuable.
Want to learn more? Check out our FDR overview here. Prefer a quick video on this topic, click here or watch below.


Tags
file-detection-and-response threat-hunting

Get The InQuest Insider

Find us on Twitter for frequent updates, follow our Blog for bi-weekly technical write-ups, or subscribe here to receive our monthly newsletter, The InQuest Insider. We curate and provide you with the latest news stories, field notes about innovative malware, novel research / analysis / threat hunting tools, security tips and more.