AI for Cybersecurity, machine learning under attack.
Cybersecurity was one of the first domains to deploy machine learning at production scale and is still one of the hardest. The defender's data is generated by adversaries who change their behaviour the moment they notice you're watching. Labels are scarce and noisy. Class imbalance is extreme — millions of benign events for every malicious one. Models can be evaded, poisoned, stolen, and turned against their operators. Building on the conceptual foundations of Ch 06 (the CIA triad, threat modelling, the layered defence architecture, the SOC), this chapter develops the methodology — the major application areas, the engineering constraints that distinguish security ML from generic ML, the specific adversarial-robustness machinery that security applications require, and the deployment realities of putting AI into a security organisation that already runs on signature-based detection.
Prerequisites & orientation
This chapter builds directly on Ch 06 (Intro to Cybersecurity) — the conceptual material there is the prerequisite for everything below, and the chapter cross-references back to specific Ch 06 sections rather than redeveloping the concepts. On the ML side, the chapter assumes supervised classification (Part IV Ch 02), neural-network fundamentals (Part V Ch 01–02), and the anomaly-detection methodology of Part XIII Ch 02 (which is foundational to most of the intrusion-detection and behavioural-analytics work). Graph-neural-network material (Part XIII Ch 05) supports the lateral-movement and IAM sections; NLP fundamentals (Part VI Ch 01) support the phishing and malware-script sections; the federated-learning chapter (Part XIII Ch 10) is relevant to the cross-organisation threat-intelligence work.
Two threads run through the chapter. The first is adversarial drift: any ML model that becomes a defensive control will be probed and evaded by attackers, and the methodology of security ML is largely about building and operating models that survive this pressure. The second is integration with existing operations: security ML rarely replaces the SOC and the SIEM (Ch 06 Section 9); it augments them, and the deployment patterns are shaped by alert fatigue, analyst workflow, and the regulatory constraints Ch 06 Section 10 surveyed. The chapter is organised by application area, with adversarial-robustness machinery developed in detail in its own section (Section 7) and woven into the others as relevant.
Why AI for Cybersecurity Is Distinctive
Ch 06 explained why cybersecurity is its own discipline; this chapter is about why ML in cybersecurity is its own discipline. The conceptual properties Ch 06 introduced — adversarial attackers, defender's asymmetry, defence-in-depth, the SOC's operational reality — translate into specific methodological constraints that make security ML different from generic supervised learning. This section maps each property to its ML implication; the rest of the chapter develops the resulting methodology.
From adversarial attackers to model drift on purpose
Most ML domains assume the data-generating process is approximately stationary, with the deployment distribution close to the training distribution. Security ML cannot assume this. The moment a model becomes a defensive control, attackers begin probing it, learn what it flags, and modify their behaviour to evade it. This is the standard ML problem of distribution shift, but with a crucial twist — the shift is intentional, adaptive, and continuous. A malware classifier with 99.5% accuracy on a static benchmark may produce 30% accuracy in production three months later because the malware authors have adapted. The methodology that follows is constant retraining, behavioural rather than signature-based features, and explicit defensive thinking about what an attacker would change next.
From defender's asymmetry to extreme class imbalance
Ch 06 Section 1's defender's-dilemma framing has a sharp ML implication: the base rate of malicious events is extremely low. In a typical enterprise, less than 0.01% of network connections are malicious; less than 0.1% of emails are phishing; less than 0.01% of files are malware. The class-imbalance problem this creates is more severe than the comparable financial-fraud regime (Ch 04 Section 8) because the cost asymmetry is also extreme — false negatives can cost millions, false positives produce alert fatigue that drives compliance to zero. The methodology — calibrated probabilities, careful threshold selection, ranking-style losses, focused negative sampling — is shaped throughout by this asymmetry.
From defence-in-depth to ML as one layer
Ch 06 Section 1's defence-in-depth principle has an underappreciated implication for security ML: the model is not a complete defence and should not be evaluated as one. A malware classifier with 90% recall is not an embarrassment if it is the third layer behind hash-based blocking and YARA rules — it catches most of the long tail those layers miss, and the residual goes to human SOC analysts. The right framing for security ML is "raise the cost to the attacker by some increment" rather than "be uncrackable," and the right metrics measure the model's marginal contribution to the layered stack rather than its absolute performance.
From SOC operations to deployment under workflow constraints
Ch 06 Section 9 covered the SOC and the SIEM. Security ML almost always deploys into this workflow rather than replacing it: model outputs become alerts that flow into the SIEM, get triaged by analysts, escalate to investigations. The deployment constraint is severe — alert fatigue means models that produce many false alarms are quickly disabled or ignored, and the operational discipline is to tune thresholds for analyst-tolerable false-positive rates rather than for benchmark optima. Section 8 develops the operational machinery; the conceptual point here is that security ML is not deployed in isolation, and the methodology has to produce outputs that SOC analysts can actually use.
From threat modelling to evaluation under attack
Ch 06 Section 3 introduced threat modelling. The same discipline applies to ML models themselves. Every security ML model should be evaluated not just on benign data but against an explicit threat model — what would an attacker do to evade this model? The standard adversarial-ML attacks (covered in Section 7) are part of the picture, but security-specific evasion patterns matter at least as much: malware that detects sandbox environments, phishing that varies linguistic style, lateral movement that mimics legitimate admin activity. The 2024 MLSecOps standards (MITRE ATLAS, the OWASP ML Top 10) formalise this evaluation discipline.
Most ML problems are about prediction in a stable world. Security ML is about prediction in a world where adversaries actively change the data to defeat the predictor. The methodology of the chapter is shaped throughout by this constraint — by adversarial drift, extreme class imbalance, deployment as part of a layered stack, and evaluation under attack. Every section that follows is a domain where these constraints reshape what works.
Network Intrusion Detection
The network is where attacks travel — initial access, command-and-control, lateral movement, and exfiltration all leave traces. Ch 06 Section 5 introduced the classical signature-based IDS/IPS architecture; this section develops what ML adds to it. Modern network intrusion detection combines signature databases with ML-based anomaly detection, behavioural analysis, and graph methods, with the ML components catching attacks that signature-based systems miss.
Flow-based features and the NIDS architecture
The standard data substrate for ML-based NIDS is the network flow — a summary of a single connection (source IP, destination IP, ports, protocol, duration, bytes transferred, packet counts, TCP flags). Flow records are produced by NetFlow / IPFIX exporters on routers and switches, or extracted from packet captures. A typical enterprise generates billions of flows per day, which makes computational efficiency a first-class concern. Most production NIDS use tree-based models (XGBoost, LightGBM) on hand-crafted flow features for the first-tier detection, with neural models reserved for specific subdomains.
The classical academic benchmarks — KDD Cup 1999 and its successors NSL-KDD, UNSW-NB15, CICIDS — have well-known limitations (artificial traffic, outdated attack types) that produce misleadingly high accuracy. The 2020s industry trend is toward production telemetry from real environments and toward evaluation that explicitly tests against attacker adaptation rather than against fixed labelled datasets.
DGA detection
Many malware families use domain generation algorithms (DGAs) to produce dynamic command-and-control domain names — a long string of pseudo-random domains, only some of which the attacker registers, defeating static blocklists. Detecting DGA-generated domain names is a clean ML problem: train a classifier on labelled domain strings (legitimate vs. DGA), use string-level features (n-grams, character distributions, dictionary-word presence, length), and flag connections to suspected DGA domains in real time. DGArchive and related datasets provide labelled training data; production deployments use character-level CNNs or transformers with substantial success.
Lateral movement detection
Once an attacker has initial access, their activity within the network looks superficially like legitimate admin behaviour — RDP connections, PowerShell remoting, SMB shares, scheduled tasks. Lateral movement detection is the problem of distinguishing attacker traversal from legitimate use, and it is one of the hardest problems in security ML because the action types overlap. The dominant approach uses graph methods on the authentication and access graph (Part XIII Ch 05): nodes are users and machines, edges are authentication or access events with timestamps, and anomalous subgraphs (a user account suddenly accessing many machines it never has before, accounts traversing in unusual sequences) flag potential intrusions. Microsoft's Defender for Identity and the various commercial UEBA products use graph-based approaches at scale.
Encrypted-traffic classification
The 2010s and 2020s have seen near-universal adoption of TLS for network traffic (Ch 06 Section 4). Traditional deep-packet-inspection NIDS that look inside connection payloads can no longer do so. ML offers a partial response: encrypted-traffic classification uses metadata that's still visible (packet sizes, timing, TLS handshake fingerprints, server-name indication) to identify application or threat type without decrypting. The methodology has matured substantially — the JA3/JA4 TLS fingerprinting standards plus ML-based traffic classifiers handle the bulk of the use cases — but it is genuinely harder than the pre-encryption era and represents a real loss of visibility for defenders.
The false-positive problem
The single hardest practical problem in NIDS ML is producing actionable alerts. A model with 99% specificity on billions of flows still produces tens of millions of false positives per day; a SOC cannot triage that volume. Production NIDS combine multiple stages — high-recall ML classifiers feed into a correlation engine that combines signals across hosts and time, which feeds into a final scoring layer that produces analyst-tolerable alert volumes. The pipeline as a whole, not any single classifier, is what determines operational utility.
Malware Classification
Distinguishing malicious from benign software is the canonical security-ML problem and the most-studied. Ch 06 Section 8 introduced malware families and the antivirus-to-EDR transition; this section develops the ML methodology behind modern malware detection — static analysis, dynamic analysis, behavioural classification, and the adversarial pressure that shapes all of them.
Static analysis: features from the binary
Static analysis extracts features from a binary without executing it. The substrate for Windows binaries is the PE (Portable Executable) format, with features including: file metadata (size, sections, imports, exports), header anomalies, string contents, byte n-grams, opcode sequences from disassembly, and increasingly raw byte representations passed through CNNs (the MalConv architecture, Raff et al. 2018, and successors). The Microsoft EMBER dataset (2018) and its successors provide standardised PE-feature benchmarks; production EDR products combine static features with dynamic and behavioural signals.
Static analysis has a fundamental limitation: packers and obfuscators compress, encrypt, or otherwise transform the binary so that its static features look benign while the runtime behaviour is malicious. Modern malware is heavily obfuscated, and pure-static detection is weak as a result. Production pipelines use static analysis as an inexpensive first filter and route ambiguous cases to dynamic or behavioural analysis.
Dynamic analysis: features from sandboxed execution
Dynamic analysis runs the suspected binary in an isolated sandbox (Cuckoo, Falcon Sandbox, Microsoft's commercial offerings) and collects behavioural traces — system calls made, files accessed, registry keys modified, network connections attempted, API calls into the OS. The resulting feature space is much richer than static features and much harder for malware authors to obfuscate without breaking the malware's actual behaviour.
Dynamic analysis has its own limitation: malware often detects sandboxed environments (the VM-aware or sandbox-evading family) and refuses to execute, or executes only benign behaviour, when it senses analysis. The countermeasure is increasingly realistic sandboxes — anti-anti-VM techniques, time-distortion to simulate real-system uptime, integration with real user-activity simulators. This is itself an arms race that mirrors the broader adversarial dynamic.
Behavioural detection at the endpoint
The 2010s shift from signature-based AV to EDR (Ch 06 Section 6) made behavioural detection the dominant deployment pattern. EDR agents monitor live process behaviour: a Microsoft Office process spawning PowerShell, downloading content, and connecting to an unusual IP is suspicious in combination even if no individual step is. The ML on top of EDR telemetry classifies behavioural sequences rather than file-level binaries, with the underlying methodology drawing on sequence models (RNNs and increasingly transformers) over event traces.
The 2024 generation of behavioural-detection models increasingly uses MITRE ATT&CK technique mapping (Ch 06 Section 3) as the labelling structure. Rather than just classifying "malicious vs. benign," the model produces per-technique attribution ("this looks like T1059.001 PowerShell execution combined with T1071.001 Web Protocols command-and-control"), which gives analysts the structured starting point for investigation that pure binary classification does not.
Adversarial evasion in malware detection
Malware classifiers face the strongest adversarial pressure of any security-ML domain. Authors test their malware against commercial AV/EDR products, modify it until it evades, and ship the modified version. The result is observable as concept drift in production: a model that was 99% accurate at deployment is 80% accurate six months later not because the world changed but because attackers adapted to it. Section 7 develops the adversarial-ML machinery in detail; here the empirical reality is that production malware classifiers must be retrained continuously, with substantial telemetry-collection infrastructure to gather the adapted samples.
The script-malware and LLM-augmented detection
A 2024 frontier worth flagging: increasing portion of malware is script-based rather than compiled binaries — PowerShell, VBScript, JavaScript, Python deployed in the form of "living off the land" attacks that use legitimate system tools. The static and dynamic analyses developed for compiled binaries don't transfer cleanly. The 2024 generation of script-malware detection uses LLM-based code analysis: prompt the LLM with the script and ask whether it implements malicious behaviour, with surprisingly good results on novel obfuscated scripts that signature-based tools miss. The cost-quality trade-off is substantial but real, and production deployments at major endpoint vendors increasingly include LLM-based components.
Phishing and Email Security
Email is the single most-exploited initial-access vector in modern intrusions. Ch 06 Section 8 introduced phishing, business email compromise (BEC), and the social-engineering family of attacks. This section develops the ML methodology behind email security — URL classification, content classification, sender authentication, and the rapidly growing problem of AI-generated phishing.
URL classification
The single highest-value detection in email security is identifying suspicious URLs. URL classification models extract features from the URL string itself — domain age, certificate properties, character patterns, URL structure, lexical similarity to legitimate brands — plus content fetched from the destination (page DOM, form fields, JavaScript behaviour) when the link is followed in a sandbox. The dominant production architectures combine character-level CNNs on URL strings with separate classifiers on the fetched-content features.
Modern phishing campaigns aggressively rotate URLs to defeat blocklists — a single campaign may use thousands of unique URLs in a few hours. ML-based detection that generalises from features rather than memorising URLs is the only viable approach at scale. Google's Safe Browsing, Microsoft's SmartScreen, and the various commercial email-security products all run URL classification at production scale.
Content classification and BEC detection
Beyond URLs, email content itself can be classified for phishing intent — the linguistic register of urgency-inducing requests, the use of authority figures, requests for unusual actions (wire transfers, gift card purchases). Business email compromise (BEC) is particularly hard because the attacks impersonate executives or trusted vendors using grammatically correct, contextually appropriate text. The 2020s generation of BEC detection uses NLP models trained on labelled phishing corpora plus sender-context features (did this address ever send mail before? does the conversation history make sense?).
Sender authentication
Several non-ML defences underpin email security and ML detection sits on top of them. SPF (Sender Policy Framework) lets domain owners specify which IPs may send their mail. DKIM (DomainKeys Identified Mail) signs outgoing email with cryptographic keys verifiable via DNS. DMARC ties them together with policy directives. Email that fails these checks is flagged; email that passes is still subject to ML analysis but with a stronger trust prior. Production email security treats authentication failures as a strong feature for the downstream classifier rather than as an absolute block, because legitimate mail also fails authentication often enough.
AI-generated phishing
The 2023–2026 wave of LLMs has substantially reshaped the phishing threat landscape. AI-generated phishing is grammatically perfect, personalised at scale, and contextually aware — defeating the linguistic-cue-based classifiers that worked on earlier-generation attacks. The 2024 generation of email-security products responds with several adaptations: sender-relationship features (you've never received mail from this address before), behavioural anomaly detection (this CEO never asked for wire transfers via email previously), and increasingly LLM-based detectors that can recognise other LLMs' output style. The arms race here is moving fast.
Voice cloning and deepfake threats
Phishing has expanded beyond email. Vishing (voice phishing) increasingly uses LLM-driven conversational bots and voice-cloning to call targets in convincing impersonation of executives, family members, or vendors. The 2024 cases of CFOs wiring millions to attacker accounts after voice-cloned calls are no longer rare. The defensive response combines authentication procedures (call-back verification through known channels), voice-cloning detection ML (still unreliable but improving), and user training that increasingly assumes voice can be faked at low cost. Section 9 develops the AI-as-attack-tool framing; the conceptual point here is that the email-security boundary has expanded to cover all forms of social-engineering communication.
User and Entity Behaviour Analytics
Most modern attacks don't break security technology — they steal credentials and use them legitimately, as Ch 06 Section 7 noted. User and Entity Behaviour Analytics (UEBA) is the security-ML response: model what each user, service account, and device normally does, and flag deviations. The methodology connects directly to the anomaly-detection material of Part XIII Ch 02, applied to identity- and access-graph data.
The UEBA architecture
The standard UEBA pipeline ingests authentication events, file accesses, application uses, network destinations, and time-of-day patterns from across the environment, attributing each event to a user or entity. Per-entity baselines model normal behaviour over a multi-week window; live events are scored against the baseline; large deviations produce risk scores that feed into the SIEM. The dominant production architectures use ensembles — multiple complementary anomaly detectors (statistical baselines, isolation forests, autoencoders, sequence models on event streams) — with score aggregation at the entity level.
The hard problem in UEBA is the same as in any anomaly detection: legitimate behaviour varies, and not all anomalies are malicious. A user travelling to a conference logs in from an unusual country; a developer running an unusual command may be doing legitimate troubleshooting. Production UEBA systems address this with contextual features (HR data on travel, change-management ticket integration, peer-group comparisons) that distinguish "anomalous and explicable" from "anomalous and unexplained."
Insider-threat detection
The most-distinctive UEBA application is insider-threat detection — identifying employees, contractors, or partners abusing legitimate access. The classical examples are data exfiltration (downloading customer databases before leaving for a competitor), sabotage (deleting code repositories), and fraud (making unauthorised financial transactions). Detection focuses on volumetric anomalies (this user is downloading 10× their usual volume), access anomalies (accessing files unrelated to their role), and timing anomalies (off-hours activity for users without on-call roles).
Insider-threat ML faces a sharp empirical reality: insider-threat events are rare enough that any single organisation has too few labelled cases to train a supervised model. Production approaches use unsupervised anomaly detection (no positive labels needed), curated rule-based detections for high-confidence patterns, and increasingly cross-organisation federated training (Part XIII Ch 10) to pool insider-threat signals across multiple organisations without sharing the underlying data.
Account compromise detection
Account compromise — an external attacker using stolen credentials — looks similar to insider threat but with subtly different patterns. Compromised accounts often show geographic impossibility (logins from two distant locations close in time), device anomalies (login from never-before-seen device), concurrent session anomalies (the legitimate user is also logged in), and behavioural deviations (the attacker doesn't know the user's usual activities). Production UEBA detects these with the same general machinery as insider threat but with rules and features tuned for the external-attacker case.
Graph methods for IAM analytics
Identity and access in modern environments form a graph — users, groups, roles, services, resources, with edges representing access grants and authentication events. Graph neural networks (Part XIII Ch 05) increasingly drive the more sophisticated UEBA detections: sub-graph anomalies that single-entity baselines miss (a privilege-escalation chain across multiple accounts, a pattern of access expansion over weeks), structural detections of permission sprawl, and identification of high-risk paths that an attacker could exploit. The 2024 generation of identity-security products (Microsoft Defender for Identity, the various IAM-analytics startups) increasingly use graph-based analysis as the primary detection layer.
The privacy and ethics layer
UEBA monitors employees in detail, which raises substantial privacy concerns. The methodology must navigate works-council requirements in Europe, employee-privacy laws in various jurisdictions, and reputational concerns about surveillance. Production deployments typically include explicit data-minimisation (collect only security-relevant signals), retention limits (delete old behaviour data), access controls (only authorised investigators can review individual employee data), and transparency commitments. The chapter does not develop the policy questions in detail but the practitioner should be aware that UEBA carries policy weight that other security-ML applications do not.
Vulnerability Discovery and Prioritisation
Beyond detection, ML increasingly enters the offensive-vulnerability and defensive-prioritisation pipelines. Ch 06 Section 6 covered vulnerability management as an operational discipline; this section develops where ML adds value — finding new vulnerabilities, predicting which known ones will be exploited, and supporting code review at scale.
Fuzzing-with-ML
Fuzzing — generating random or semi-random inputs to find crashes — is the dominant vulnerability-discovery technique. Classical fuzzers (AFL, libFuzzer) use coverage-guided mutation: mutate inputs that reach new code paths more aggressively than those that don't. The 2018–2024 wave of ML-guided fuzzing uses neural networks to predict which mutations are most likely to produce new coverage, dramatically accelerating bug discovery on hard targets. Google's OSS-Fuzz infrastructure and Microsoft's project-level fuzzing pipelines incorporate ML-guided components at scale.
The 2023–2026 frontier is LLM-guided fuzzing — prompt the LLM with the target source code and ask it to generate inputs likely to trigger interesting behaviour. The empirical results are real: LLM-generated harnesses and seeds find bugs that classical coverage-guided fuzzing missed, and several published 2024 results show CVE-grade bugs discovered by LLM-augmented fuzzers in widely-used software.
Vulnerability prediction in code
Vulnerability prediction models classify code regions as more or less likely to contain security vulnerabilities. Inputs include syntactic features, complexity metrics, change history, and increasingly learned representations from pretrained code models (CodeBERT, Code Llama). Production deployments at major tech companies use vulnerability prediction to prioritise security review effort — the model can't find specific bugs but it can identify which files to look at first. The empirical accuracy is modest (precision in the 30–60% range on serious benchmarks) but useful as a triage signal.
EPSS and exploit-likelihood prediction
Once vulnerabilities are public (Ch 06 Section 6 covered the CVE/CVSS framework), defenders face the prioritisation problem: thousands of new CVEs per year, only a fraction will actually be exploited in the wild. The Exploit Prediction Scoring System (EPSS, Jacobs et al. 2019, FIRST.org) is an ML model that predicts the probability a given CVE will be exploited within 30 days, using features like vendor, vulnerability type, public-exploit availability, and historical exploitation patterns. EPSS scores have become a standard input to vulnerability-management workflows, complementing CVSS with empirical exploitation likelihood.
LLM-based code review
The 2023–2026 generation of LLMs is increasingly used for security-focused code review. The pattern: prompt the LLM with code plus a taxonomy of vulnerability classes (the OWASP Top 10, CWE entries) and ask it to identify potential issues. The empirical evidence is mixed — LLMs find some classes of bugs reliably (SQL injection, missing authorisation checks, hardcoded secrets) but produce many false positives and miss subtle logic bugs that require deep program-state reasoning. Production deployments at major tech companies use LLM-based review as one signal among many, with human reviewers handling the final triage.
The supply-chain context
Most enterprise software is built on hundreds of open-source dependencies. Software composition analysis (SCA) identifies these dependencies and matches them against known-vulnerability databases; ML increasingly augments the matching with semantic-similarity-based detection that catches modified-but-still-vulnerable forks. The 2021 Log4Shell incident (Ch 06 Section 6) spotlighted how a vulnerability in a deeply-nested dependency could affect everyone; the 2024 XZ-utils backdoor showed that supply-chain attacks can be deliberately introduced. ML-based supply-chain analysis is increasingly a board-level concern, with substantial venture investment in the space.
Adversarial Robustness in Security Contexts
Every ML model deployed as a security control becomes a target. Attackers probe, evade, poison, steal, and otherwise turn defensive ML against its operators. The general field of adversarial machine learning studies these attacks and the defences against them; this section develops the security-specific machinery — what evasion looks like in malware classification or NIDS, why poisoning is particularly dangerous for security pipelines, and what robustness actually means in this context.
Evasion attacks
Evasion attacks craft inputs that the model classifies incorrectly. The classical academic version — adversarial examples in image classification, where imperceptible perturbations flip the prediction — translates to security in modified form. A malware author modifies their binary so a classifier rates it benign; a phishing attacker modifies their email to evade content classifiers; a network attacker shapes their traffic to look like benign flows. Crucially, the perturbation budget in security applications is not "imperceptible to a human" — it is "preserves the malicious functionality." This is much looser than the academic budget, which is one reason security ML is so hard to make robust.
The standard evasion methodologies include: gradient-based attacks (PGD, C&W) where the attacker has white-box access; transfer attacks where the attacker uses a surrogate model to craft examples that transfer to the target; query-based black-box attacks where the attacker only sees model outputs. In production security, white-box access is rare but not impossible (insider threats, leaked models); transfer and black-box attacks are the dominant practical concerns.
Poisoning attacks
Poisoning attacks contaminate the training data so the resulting model has properties the attacker wants. The most-studied variant is backdoor poisoning: insert training examples with a specific trigger pattern such that the model classifies any input with the trigger as the attacker-chosen class. For security ML, the attack surface is severe — feedback loops where flagged samples become training data give attackers an opportunity to inject poisons, and threat-intelligence sharing between organisations magnifies the risk if any one source is compromised.
Defences include: training-data sanitisation (anomaly-detect poisoning attempts before they reach the model), differentially-private training (Part XIII Ch 10) that bounds any single example's influence, robust statistics-based training that reduces sensitivity to outliers, and explicit auditing of training-data sources. Production security-ML pipelines treat the training pipeline as a sensitive asset with explicit access controls and signing for data sources.
Model stealing and extraction
An attacker with API access can submit queries and learn the model's decision boundary, eventually reconstructing a useful copy of the model. Model stealing attacks have been demonstrated against production ML services, and the security-ML implications are particularly severe — a stolen model gives the attacker a perfect oracle for testing evasion strategies offline. Defences include rate limiting, query-pattern detection (legitimate users don't submit synthetic boundary-probing queries), watermarking (embed identifiers in outputs to detect stolen-model usage), and moving the most sensitive models behind authenticated workflows rather than public APIs.
The MITRE ATLAS framework
The 2021 MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) framework organises adversarial-ML attacks into a structure parallel to the MITRE ATT&CK framework Ch 06 Section 3 introduced. Tactics include reconnaissance, resource development, initial access, ML model access, execution, persistence, defence evasion, discovery, collection, ML attack staging, exfiltration, and impact. Each tactic decomposes into specific techniques (membership inference, model inversion, transfer attack, backdoor, etc.) with documented case studies. ATLAS is increasingly used in production security-ML threat modelling, and the AI-system threat-modelling discipline maps closely to the security-system threat-modelling discipline of Ch 06.
What robustness actually means in security
Academic adversarial-ML research has produced certified-robustness frameworks — bounds on how much an attacker can change an input before crossing the decision boundary, with formal guarantees. These translate poorly to security applications because the security-relevant perturbation budget is not bounded by an Lp norm but by "preserves the attacker's objective." Production security-ML increasingly takes a defence-in-depth view of robustness: no single model is robust, but a layered stack with diverse models, behavioural correlation across signals, and human-in-the-loop investigation is harder to evade end-to-end than any single layer. This is the empirical version of Ch 06's defence-in-depth principle, applied to ML rather than to network controls.
AI for SOC Operations
Beyond the detection models of the prior sections, AI is increasingly used for the SOC operational machinery itself — alert triage, investigation, response orchestration. Ch 06 Section 9 introduced the SOC and the SIEM; this section develops where ML enters the analyst workflow, with the security-skills-gap motivation Ch 06 Section 9 highlighted as the dominant practical driver.
Alert triage
The first practical AI-for-SOC application is alert triage — classifying SIEM alerts by likely severity and false-positive probability so analysts can prioritise. The methodology trains models on historical alert dispositions ("analyst confirmed this was a true positive" vs. "analyst marked false positive"), with features extracted from the alert itself, the source system, the affected entity, and historical context. Production deployments at major SOCs report 30–70% reduction in tier-1 analyst workload, which is meaningful given the security skills gap.
The empirical reality is mixed. Alert-triage models that learn from analyst decisions inherit the analysts' biases — including the bias toward closing alerts quickly, which means the training data systematically under-labels true positives that the analysts missed. Production systems address this with periodic auditing and explicit feedback loops that highlight model-analyst disagreements for quality review.
Automated investigation
Beyond triage, the 2023–2026 generation increasingly uses LLMs for automated investigation — given an alert, query the SIEM for related events, summarise the findings, propose hypotheses, suggest next steps. The architecture is exactly the agentic-AI pattern of Part XI: the LLM is the orchestrator, the SIEM and threat-intelligence platforms are tools, and the investigation report is the output. Microsoft's Security Copilot, the various proprietary SOC-LLM products, and several major security vendors' "AI analysts" all operate on this pattern.
The empirical results are encouraging on standard tasks (alert summarisation, basic event correlation, query generation) and more variable on complex investigations that require deep system knowledge or unique organisational context. Production deployments use AI-assisted investigation as a force multiplier for human analysts rather than as autonomous decision-makers, with the analyst still owning the dispatch decisions.
SOAR and automated response
Security Orchestration, Automation, and Response (SOAR) platforms automate response workflows — when alert X matches pattern Y, do action Z. SOAR was rule-based for years; the 2024 generation increasingly uses ML to suggest playbooks based on alert characteristics, with the LLM-driven version producing custom playbooks on demand. The full automation question — should the AI execute responses without human review? — is contested, with most production deployments requiring analyst approval for actions that have business impact (isolating endpoints, disabling user accounts) and allowing automation only for clearly-low-risk actions (logging, additional data collection).
The cost-of-error problem
The cost of errors in SOC AI is high but in different ways than in detection AI. A bad alert-triage decision means a real attack is missed; a bad investigation conclusion means analysts are misled; a bad automated response means business systems are disrupted. The deployment discipline is to bound the cost of errors — keeping high-stakes decisions in human hands, providing explanation and reversibility, and maintaining audit trails for after-the-fact review. The 2024 SEC-rule and DORA regulatory frameworks treat security automation as material risk that requires explicit oversight; production AI-for-SOC operates accordingly.
Where the SOC is heading
The composite picture from major security vendors and large SOCs in 2026: tier-1 alert triage is increasingly AI-driven, with humans handling escalations. Investigation copilots augment tier-2 analysts substantially. Automated response is bounded to low-risk actions plus analyst-approved high-risk ones. Threat hunting and incident response remain dominated by human analysts with AI augmentation. The security skills gap (Ch 06 Section 9) is the primary economic driver — every dollar of analyst time freed by automation goes to higher-value work, and the trajectory is toward more automation rather than less. Whether this fully closes the skills gap remains to be seen; the 2026 evidence is that it helps but does not eliminate the problem.
AI as an Attack Tool
The previous sections covered AI for defence. This section flips the perspective: how do attackers use AI? The 2023–2026 wave of LLM accessibility has substantially lowered the barrier to several attack capabilities, and defenders need to model attacker AI use rather than pretend it isn't happening. The arms race that has shaped security-ML detection (Section 7) increasingly includes AI on both sides.
AI-assisted reconnaissance and social engineering
Attackers use LLMs to scrape victim profiles from social media, draft personalised phishing content, generate convincing pretext for vishing calls, and automate the initial-access workflow. The 2024 reports of AI-orchestrated business-email-compromise campaigns at scale are the visible tip of this. The defensive response combines content-detection (Section 4), authentication-strength requirements (Ch 06 Section 7), and user awareness that voice and text can be cheaply faked.
AI-generated malware
The early generation of LLMs could generate basic malicious code with appropriate prompting. Major LLM providers' safety training has substantially reduced this for hosted models, but open-weight models without those safety measures (deliberately retrained or downloaded uncensored) can produce functional malware. Several 2024 incidents documented LLM-generated ransomware variants and reconnaissance scripts in real attack campaigns. The 2026 reality: AI-generated malware is operationally feasible at low skill levels, which expands the attacker population rather than enabling fundamentally new attacks.
The 2024 frontier of AI-generated polymorphism uses LLMs to continuously vary malicious code so each deployed sample is unique, defeating signature-based detection by construction. ML-based detection (Section 3) mostly handles this because behavioural features are stable across syntactic variants — but the empirical evidence on how well this scales is still accumulating.
Deepfakes and voice cloning
Voice and video deepfakes are the most-publicised AI attack capability. The 2024 cases of multi-million-dollar BEC executions via voice-cloned executive impersonation are now routine enough to be reported as standard fraud rather than novelty. Real-time deepfake video calls (made tractable by 2024-era inference improvements) extend the attack into live communications. Defensive responses include authentication procedures (always confirm via second channel), specialised deepfake-detection ML (still unreliable but improving), and watermarking of legitimate content (the C2PA initiative for media authentication).
Prompt injection and LLM-app exploitation
As organisations deploy LLM-based applications, those applications themselves become attack surfaces. Prompt injection — embedding instructions in data the LLM processes that override the application's intended behaviour — is the canonical LLM-application vulnerability. Indirect prompt injection (instructions embedded in retrieved documents, web pages, or emails) is particularly dangerous for retrieval-augmented systems. The OWASP LLM Top 10 codifies the common attack patterns; defences include input sanitisation, instruction-data separation, output validation, and increasingly, dedicated prompt-injection-detection models. Production LLM application security is rapidly developing as a field in its own right, with substantial overlap with the AI-for-cybersecurity material of this chapter.
Offensive AI capabilities at the high end
The most-watched offensive-AI development is automated vulnerability discovery and exploit generation. The 2024–2025 wave of demonstrations established that LLMs paired with execution environments could find and exploit vulnerabilities in real software at scale, with the 2024 DARPA AI Cyber Challenge a pivotal showcase. Commercial offensive-AI tools (PentestGPT, the various AI-driven red-team platforms) are deployed by ethical security testers; the same capabilities are presumably available to advanced adversaries. The April 2026 release of Anthropic's Claude Mythos Preview moved the conversation decisively — Mythos is the first publicly-evaluated frontier model with autonomous offensive cyber capability at expert-human level, and Section 10 develops the timeline and the field's response in detail. The defensive response is faster patching, better software-supply-chain controls, and increased investment in defensive automated discovery to find and fix vulnerabilities before attackers do.
The cat-and-mouse game
The attacker-defender arms race in AI-augmented security mirrors the broader arms race that has always characterised the field. Each new defensive capability prompts attacker adaptation; each new attacker capability prompts defensive response. The 2026 reality is that AI is shifting the balance in measurable ways — attacker productivity has increased on certain attack types, defender productivity has increased on certain detection and response tasks — without fundamentally changing the structural shape of the contest. The methodology of the chapter is shaped throughout by this expectation: build, deploy, monitor for adaptation, retrain, repeat.
The Cybersecurity Timeline and the Frontier-AI Moment
Cybersecurity has always been shaped by capability inflections — moments when a new attack class, a new defence, or a new actor changed what defenders had to worry about. Mapping the field's trajectory by these inflection points puts the current moment in context: the late-1980s arrival of self-propagating worms, the late-1990s commercialisation of the internet and the security industry that followed, the 2010s nation-state-driven era of advanced persistent threats, the mid-2010s ransomware crisis, the 2020s supply-chain reckoning, and — in April 2026 — the arrival of frontier AI systems with autonomous offensive cyber capability. This section walks the timeline and lands on what has become the field's defining moment: the release of Claude Mythos Preview and the coordinated industry response that followed.
1971–1988: from Creeper to the Morris worm
The first programs that could be called "malware" predate "cybersecurity" as a discipline. Creeper (1971) was a self-replicating ARPANET program written by Bob Thomas as a curiosity; Reaper was written shortly after to delete it. The Morris worm (1988) was the first internet incident with consequences — Robert Tappan Morris's experimental program escaped into the live internet, exploited Unix vulnerabilities, and effectively crashed something like 10% of the connected hosts. The CERT Coordination Center was established in response, the first prosecution under the US Computer Fraud and Abuse Act followed, and the discipline of incident response was born.
1990s: the security industry emerges
The commercial internet of the 1990s produced both the targets and the industry that protected them. Antivirus vendors (McAfee, Norton, Trend Micro) productised signature-based detection. Firewall vendors (Check Point, Cisco) productised the perimeter. Cryptography moved from academic curiosity to web infrastructure with the standardisation of SSL/TLS and the founding of certificate authorities (Ch 06 Section 4). Meanwhile, the underground community institutionalised — the first DEF CON in 1993, the formation of L0pht Heavy Industries and other security-research collectives, the famous L0pht testimony to Congress in 1998 ("we could shut down the internet in 30 minutes"). The era's defining attacks remained relatively unsophisticated — script-kiddie defacements, early viruses, IRC botnets — but the institutional substrate for everything that followed was being assembled.
2000s: worms, organised crime, and the first big breaches
The 2000s brought self-propagating worms at scale. SQL Slammer (2003) saturated the internet's bandwidth in fifteen minutes. Blaster (2003) and MyDoom (2004) followed similar patterns. Conficker (2008) infected millions of machines and remained on Windows networks for years. By the late 2000s, organised criminal groups had taken over the malware ecosystem from individual hobbyists, with banking trojans (Zeus, SpyEye), credit-card-fraud rings, and the first commercial-scale spam botnets. The TJX breach (2007) and Heartland Payment Systems breach (2009) established the modern data-breach pattern — payment-card theft at scale through SQL injection in retail networks.
2010s: nation-states and APTs
The 2010s revealed that nation-states had been using cyber operations as routine instruments of statecraft for years. Stuxnet (publicly identified 2010) was a US-Israel joint operation targeting Iranian uranium-enrichment centrifuges — the first cyberattack to cause documented physical damage. Operation Aurora (2010) compromised Google and dozens of other US firms with Chinese-state attribution. Snowden's disclosures (2013) revealed the scope of NSA capabilities. The Sony Pictures hack (2014) was North Korean retaliation for a film. The OPM breach (2015) exposed clearance records of millions of US federal employees. The discipline of advanced persistent threat (APT) tracking emerged — Mandiant's APT1 report in 2013 was a watershed publication — and the threat-intelligence industry the chapter has cited grew up around it.
The mid-2010s also brought the first widely-deployed AI-based defensive tools — Cylance (founded 2012, acquired by BlackBerry 2019) productised neural-network-based malware classification at scale; Darktrace (founded 2013) productised unsupervised anomaly detection on enterprise networks. The pattern of marketing-versus-substance that has dogged security AI ever since established itself in this period: real underlying methodology, often genuinely useful, surrounded by claims of "magic AI" that the empirical record could not support.
Mid-2010s to 2020s: ransomware and supply-chain reckoning
WannaCry (May 2017) and NotPetya (June 2017) demonstrated the destructive potential of weaponised exploits combined with worm-style propagation. WannaCry caused billions in damage at NHS hospitals, FedEx, Renault, and others; NotPetya, a Russian-state operation against Ukraine that spread globally, produced the largest single financial loss in cyber history (Maersk's recovery alone exceeded $300M). The Colonial Pipeline ransomware incident (May 2021) shut down a quarter of US East Coast fuel supply for a week.
The 2020s also became the era of supply-chain attacks. SolarWinds (December 2020) compromised the build pipeline of an enterprise IT-management vendor, distributing backdoored updates to roughly 18,000 organisations including most of the US federal civilian agencies. Log4Shell (December 2021) was a critical vulnerability in a tiny Java logging library buried inside thousands of enterprise applications — a textbook demonstration of how deeply nested dependencies become security exposure. The XZ-utils backdoor (March 2024) was a deliberate, multi-year insertion of malicious code into a widely-used open-source compression library, caught only because a Microsoft engineer noticed unusually high CPU usage during a benchmarking exercise. The era's lesson: software is not just attacked, it is poisoned — and the controls have to extend through the whole supply chain.
2023–2025: the AI-augmented era
The 2023 release of GPT-4 and the subsequent cycle of LLM-augmented tooling reshaped both sides of the security contest. On the defensive side, the 2023–2025 generation of LLM-based SOC products (Microsoft Security Copilot, the various AI analyst startups, the LLM-augmentation of established SIEM platforms) substantially changed analyst workflows — Section 8 of this chapter covered the methodology. On the offensive side, AI-generated phishing reached production scale, voice-cloning enabled BEC attacks at a fraction of historical costs, and prompt-injection emerged as a new application-layer vulnerability class against LLM-based products. The 2024 DARPA AI Cyber Challenge demonstrated that LLMs paired with execution environments could find and exploit vulnerabilities in real software autonomously, with no human in the loop. The trajectory was clear; what was unclear was when — and whether — the capabilities would cross from "research demonstrations" to "deployment-grade autonomous offensive AI."
April 2026: Mythos
That moment arrived on April 8, 2026, with the release of Claude Mythos Preview. Anthropic, the AI lab whose Claude family had been among the most-deployed assistant models since 2023, announced a new frontier model with deliberately distinctive cybersecurity capability. The published evaluation numbers were striking. On expert-level Capture the Flag tasks — a standard offensive-security benchmark on which no public model had previously scored above single digits — Mythos Preview succeeded 73% of the time. In controlled red-team evaluations with explicit network access, it executed multi-stage attacks autonomously, completing in hours work that had previously required days of effort from skilled human operators. Anthropic's pre-release vulnerability research using Mythos found "thousands of high-severity vulnerabilities" across major operating systems and web browsers — issues subsequently coordinated through standard responsible-disclosure channels.
The technical innovations underlying these results are now matters of public record. Long-context architecture allowed Mythos to ingest entire codebases at once, supporting the kind of whole-system reasoning that vulnerability discovery had long demanded. Recursive self-correction let the model iterate on failed exploit attempts the way a skilled human researcher would, refining its approach until something worked. Native integration with debuggers, container runtimes, and network tools meant Mythos could not just describe what it would do but actually do it — launching processes, observing results, and adapting in real time. The combination produced what the UK AI Security Institute (AISI) characterised in its independent evaluation as a step-change in autonomous offensive capability rather than a gradual improvement.
Project Glasswing and the dual-use response
Anthropic's response to its own discovery of these capabilities was unusual and worth understanding. The company chose not to make Mythos Preview generally available, breaking with the standard "release everything, charge per token" pattern that has dominated the LLM era. Instead, alongside the Mythos announcement, Anthropic launched Project Glasswing — a coordinated effort to use Mythos defensively for vulnerability discovery and remediation in the world's most critical software, with early-access partnerships covering AWS, Apple, Google, JPMorgan Chase, Microsoft, and Nvidia. The project's logic was straightforward: if a frontier AI lab can build Mythos, hostile actors can eventually build something similar; the responsible response is to use the capability defensively first, hardening the systems that matter most before equivalents reach attackers.
The reception of Mythos and Project Glasswing has been the dominant cybersecurity policy discussion of 2026. The CrowdStrike-led Frontier Model Forum response, the AISI's measured-but-alarmed evaluation, the World Economic Forum's framing of "the Mythos moment," and the public commentary from former US National Cyber Director Kemba Walden ("Mythos can hack nearly anything and we aren't ready") together established the consensus reading: we are now in a different phase of the AI-cybersecurity contest, and the institutional, regulatory, and operational frameworks built for the prior phase are not adequate.
What it means for the field
The methodology of the chapter does not change overnight because of a single model release. The intrusion-detection, malware-classification, UEBA, and SOC-automation work of Sections 2–8 continues; the adversarial-robustness disciplines of Section 7 become more rather than less important; the operational integration of Section 8 still dominates real-world value. What changes is the balance and pace of the contest. For defenders, the imperative to use AI defensively, ahead of attackers acquiring equivalent capability, is now urgent rather than aspirational — the central proposition of Project Glasswing. For attackers, the cost of certain attack classes has fallen substantially: vulnerability discovery, reverse engineering, exploit chaining, and target-network mapping are no longer skill-bottlenecked the way they were in 2024. For policymakers, the dual-use governance question is now concrete: a single AI lab's release decisions can move the offensive-defensive balance globally, and the institutional frameworks (export controls, AI safety institutes, voluntary commitments, the EU AI Act's high-risk-system provisions) are being tested under conditions much more demanding than their drafters anticipated.
The post-Mythos question that the field now faces is not whether autonomous offensive AI is possible — Mythos has answered that — but how the contest will play out as such systems proliferate. Subsequent frontier labs will produce comparable or stronger models. Open-weight equivalents will emerge, possibly faster than their proprietary cousins. Defensive deployments will scale to match, or fail to. The fundamental security framings of Ch 06 — defence in depth, threat modelling, the CIA triad, the SOC — remain correct and necessary. But the velocity and capability profile of the contest they are conducted within has changed, and the field is in the early phase of working out what that means in practice.
Earlier shifts in cybersecurity — the worm era, the APT era, the ransomware era, the supply-chain era — each took years to play out. The Mythos moment arrived in a single press release. The capability inflection it represents is not a hypothesis: AISI's independent evaluation, the Project Glasswing partnerships, the thousands of vulnerabilities Anthropic found pre-release, and the public statements of senior cyber officials make clear that something has changed. The methodology of this chapter — and the institutional substrate of Ch 06 — must now operate inside a contest where one side or the other can deploy autonomous offensive AI at any time, and where defensive AI is the only proportionate response.
Applications and Frontier
Beyond the core areas of intrusion detection, malware classification, phishing, and SOC automation, security ML appears in many specialised applications and is rapidly expanding into new territory in 2026. This final section surveys the application landscape and the frontier where modern AI is reshaping security.
Cloud security
Modern enterprises run substantial workloads in cloud environments (AWS, Azure, GCP, the various sovereign-cloud equivalents), and cloud-security ML is a substantial application area. Cloud Security Posture Management (CSPM) products use ML to identify misconfigurations across hundreds of cloud services. Cloud Workload Protection products run ML-based anomaly detection on container and serverless workloads. Cloud Detection and Response (CDR) products extend EDR-style behavioural detection to cloud-native workloads. The major vendors (Wiz, Lacework, Orca, Microsoft Defender for Cloud, the various others) all use ML extensively, with the underlying methodology drawing from the network and endpoint sections of this chapter applied to cloud-specific telemetry.
OT/ICS and embedded security
Industrial control systems, manufacturing networks, and embedded devices have substantial security risk and very different telemetry from IT systems. OT (Operational Technology) security uses ML to detect anomalies in industrial protocols (Modbus, DNP3, PROFINET), in process variables (sensor readings, actuator commands), and in supervisory-control behaviour. The 2017 Triton/Trisis attack on a Saudi petrochemical plant and the various Stuxnet-lineage incidents motivate the field; products like Claroty, Dragos, and Nozomi run ML-based anomaly detection on plant networks at major industrial customers. The methodology is more conservative than IT security ML — false positives in industrial environments can shut down production with substantial cost — and the deployment patterns are correspondingly cautious.
AI red teaming
The discipline of AI red teaming — actively testing AI systems for security vulnerabilities — has emerged as a distinct specialty. The 2023 Executive Order on AI Safety mandated red-team testing for high-risk AI systems; the 2024 OpenAI/Anthropic/Google red-team disclosures formalised the practice; the EU AI Act's high-risk-system provisions impose parallel requirements. AI red teams use the adversarial-ML attacks of Section 7, but they also use prompt injection, jailbreaks, and capability elicitation techniques specific to LLM systems. The skill set blends classical penetration testing with adversarial-ML expertise.
LLM-powered defence at the operational level
The 2024–2026 wave of LLM-powered security products represents the most rapid recent change in the field. Beyond Section 8's SOC applications, LLMs power: threat-intelligence summarisation (turning raw IOC feeds into structured intelligence), policy-as-code generation (writing detection rules from natural-language descriptions), incident-response runbook automation, and security-question-answering for non-specialists. The empirical results are mixed but generally positive on tasks where the LLM is one signal among several rather than the sole decision-maker.
Threat intelligence at scale
Modern threat-intelligence platforms ingest hundreds of thousands of indicators per day from feeds, reports, and analyst observations. ML increasingly powers the deduplication, attribution, and contextualisation that turns raw intelligence into actionable input for detection systems. Graph-based models on the indicator-and-actor graph identify campaign patterns across organisations; LLM-based summarisation produces analyst-readable reports from raw data; cross-organisation federated learning (Part XIII Ch 10) enables intelligence sharing without raw-data disclosure.
Frontier methods
Several frontiers are particularly active in 2026, with the post-Mythos landscape (Section 10) reshaping their priority. Foundation models for security: pretrained models on security-specific corpora (CTI reports, vulnerability databases, security literature) serving as the substrate for downstream detection and analysis tasks. Frontier-model defensive deployment: programmes like Anthropic's Project Glasswing that deploy general-purpose frontier AI for vulnerability discovery and remediation in critical software, racing to harden systems before equivalent capability reaches attackers. Agentic SOCs: multi-step autonomous investigation and response systems with bounded scope and human oversight. Hardware-rooted ML security: using TPMs, secure enclaves, and confidential computing to protect ML models and their training data. Quantum-resistant security ML: adapting cryptographic ML pipelines (federated learning, homomorphic encryption) to post-quantum primitives. AI safety meets cybersecurity: the convergence of AI alignment / safety research with traditional cybersecurity, now particularly urgent as the most-capable AI systems become both targets and potential defenders.
What this chapter does not cover
Several adjacent areas are out of scope. The conceptual cybersecurity foundations are covered in Ch 06 (CIA, threat modelling, cryptography, network/endpoint/identity, attacks, SOC, governance) and assumed here rather than redeveloped. The deeper technical material on adversarial-ML attacks beyond Section 7 — certified robustness, formal verification of ML systems, the academic adversarial-examples literature — is its own substantial discipline. AI-safety research on alignment, interpretability, and evaluation overlaps with security ML but is conventionally treated through a separate lens. Cyber warfare, offensive cyber operations, and the strategic-policy questions around them are policy issues rather than technical ones. And the substantial literature on usable security and the human factors of why people make insecure choices is essential context but its own field.
Further reading
Foundational papers and references for AI in cybersecurity. The Sommer-Paxson critique plus the Anderson-Roth book on adversarial ML plus the MITRE ATLAS framework plus the EMBER dataset paper is the right starting kit for serious security-ML work.
-
Outside the Closed World: On Using Machine Learning for Network Intrusion DetectionThe classic methodological critique. Documents why ML-based NIDS frequently disappoint in production despite strong benchmark results — base-rate problems, semantic gaps, evaluation pitfalls, the difficulty of making ML actionable in security operations. Required reading for anyone deploying security ML, and the reference that defined the field's methodological discipline. The reference for the methodological challenges.
-
EMBER: An Open Dataset for Training Static PE Malware Machine Learning ModelsThe EMBER dataset paper. Establishes the standard public benchmark for static malware classification, with extracted features from a million Windows executables. The substrate of essentially every academic malware-ML paper since 2018, and the right starting point for practical malware-classifier development. The reference dataset for malware ML.
-
Malware Detection by Eating a Whole EXE (MalConv)The MalConv paper. Establishes that CNNs operating on raw bytes of a binary can match feature-engineered detectors, opening the deep-learning approach to malware classification. Pair with Anderson et al. 2018 (gym-malware) for the adversarial-evasion follow-up that demonstrated why static detectors are hard to make robust. The reference for raw-byte malware detection.
-
MITRE ATLASThe Adversarial Threat Landscape for AI Systems framework. Catalogues adversarial-ML tactics and techniques in a structure parallel to MITRE ATT&CK. Required reference for anyone doing threat modelling on ML systems and the de facto standard for security-ML threat modelling in 2026. The reference for adversarial-ML threat modelling.
-
Adversarial Machine LearningThe standard textbook on adversarial machine learning. Comprehensive coverage of the attack and defence landscape, with substantial attention to security-specific applications (spam filtering, intrusion detection, malware classification). The right comprehensive reference for serious adversarial-ML work and a useful complement to the academic-papers literature. The textbook reference for adversarial ML.
-
Exploit Prediction Scoring System (EPSS)The EPSS framework. ML-based prediction of which CVEs are likely to be exploited in the wild within 30 days, complementing CVSS with empirical exploitation likelihood. Free public scoring API and the standard input to modern vulnerability-prioritisation workflows. The reference for vulnerability prioritisation.
-
A Survey on Adversarial Examples in Machine LearningThe Biggio-Roli historical survey. Documents that adversarial examples were studied in the security-ML community a decade before they became famous in the deep-learning community, with substantial attention to malware and spam evasion as the original motivating applications. The right historical reference for understanding the security roots of adversarial ML. The historical reference for adversarial examples.
-
OWASP Top 10 for Large Language Model ApplicationsThe standard list of LLM-application vulnerability classes. Covers prompt injection, insecure output handling, training-data poisoning, model denial-of-service, supply-chain vulnerabilities, sensitive-information disclosure, insecure plugin design, excessive agency, overreliance, and model theft. Required reading for any team deploying LLM-based applications, including the AI-for-SOC and agentic-security applications of this chapter. The reference for LLM application security.
-
NIST AI Risk Management FrameworkThe standard organisational framework for AI risk management, with explicit attention to security and adversarial robustness. Pairs with the NIST Cybersecurity Framework (Ch 06 Section 10) for organisations integrating AI into existing security programmes. The 2024 generative-AI profile and various sector-specific profiles extend the framework with application-specific guidance. The reference for AI risk management at the organisational level.
-
Claude Mythos PreviewThe official Mythos Preview announcement and the UK AI Security Institute's independent evaluation. Together they document the capability inflection of April 2026 — Mythos's expert-level CTF performance, multi-stage autonomous network attacks, and large-scale vulnerability discovery — and the Project Glasswing defensive-deployment programme that accompanied the release. Required reading for anyone tracking the post-Mythos landscape developed in Section 10. The reference for the Mythos moment.