Part XVIII · AI Safety, Alignment & Governance · Chapter 07

Privacy in ML, where the model leaks the data it was trained on.

Trained machine-learning models are not just predictors of the world; they are compressed records of the data they were trained on, and an attacker who can query the model can often recover non-trivial facts about that data. The threats are concrete: membership inference (did this person's record appear in training?), attribute inference (what's the value of an unobserved sensitive feature?), model inversion (reconstruct a training example from the trained weights), training-data extraction (verbatim recall of texts and images), and model stealing (clone a paid API into a local copy). The defensive toolkit has matured around three pillars: differential privacy as the formal guarantee, privacy-preserving training techniques (DP-SGD, federated learning, secure aggregation, homomorphic encryption), and data-deletion / unlearning machinery that lets a model forget specific records on demand. The privacy regulations (GDPR, CCPA, the 2025–2026 sectoral rules) have made these no longer optional. This chapter develops the methodology with the depth a working ML engineer, model-risk officer, or privacy engineer needs.

Prerequisites & orientation

This chapter assumes the deep-learning material of Part VI, the AI safety material of Ch 01, the explainability material of Ch 05, and the fairness material of Ch 06. Familiarity with basic probability and the noisy-channel intuition (Part I, Ch 02 on Information Theory) is essential for the differential-privacy material in §3–4; familiarity with cryptographic primitives helps for §7 on secure computation but is not required. The chapter is written for ML engineers, applied scientists, privacy engineers, model-risk officers, and product managers who must ship models trained on personal or otherwise sensitive data.

Three threads run through the chapter. The first is the threat-model first discipline: every privacy claim must be tied to an explicit attacker capability and access pattern, or it is meaningless. The second is the privacy-utility trade-off: every defence costs accuracy or latency or both, and the operational task is choosing the right point on that frontier. The third is the regulatory-and-governance dimension: privacy is the most-regulated property of modern ML systems, and the technical methodology has to be paired with documentation, auditability, and a deletion pathway that is contractually enforceable. The chapter develops each in turn.

In this chapter

Why Privacy Is Different from Fairness threat models · regulation · individual harm · disclosure
The Threat Landscape membership inference · attribute inference · model inversion · extraction
Differential Privacy: The Formal Guarantee epsilon-delta · global vs local · composition · the contract
DP Mechanisms and the Noise Calculus Laplace · Gaussian · privacy budget · sensitivity · accounting
DP-SGD and Privacy-Preserving Training gradient clipping · noise injection · accountant · practical recipes
Federated Learning and Decentralised Training FedAvg · communication · attack surface · personalisation
Secure Multiparty Computation and Homomorphic Encryption SMPC · HE · TEEs · crypto-vs-DP trade-offs
Machine Unlearning and the Right to Be Forgotten SISA · influence functions · certified unlearning · operational deletion
Privacy Governance and Regulatory Context GDPR · CCPA · sectoral rules · DPIAs · documentation
The Frontier and the Open Problems LLM extraction · synthetic data · privacy auditing · what next

Why Privacy Is Different from Fairness

Fairness (Ch 06) asks whether the model's predictions across populations satisfy a normative criterion. Privacy asks whether the model's existence leaks facts about the individuals whose data trained it. The two are complementary but the methodology is different: privacy has crisper formal definitions, more adversarial threat models, harder cryptographic primitives, and more directly regulatory teeth.

Threat-model thinking

Every privacy claim is meaningless without an explicit threat model: who is the attacker, what access do they have, what background knowledge do they bring, and what do they aim to learn? A model that is privacy-preserving against a black-box query attacker may be trivially attackable by a white-box weights-access attacker. A defence that protects against an attacker without auxiliary information may collapse against one with the right side-channel. The discipline of privacy in ML is the discipline of being precise about which attackers you are defending against — and being honest about which you are not.

Regulatory teeth

Privacy is uniquely regulated. GDPR (effective 2018) gave EU residents enforceable rights to know what data is held, to correct it, to delete it, and to receive an explanation of automated decisions. CCPA (California, 2020) and its successors gave US residents similar rights. The 2024–2026 expansion includes sectoral rules (HIPAA in healthcare, FERPA in education, the EU AI Act's data-governance requirements, the CFPB's adverse-action rules in finance). The fines have been substantial — Meta's €1.2B 2023 fine, Amazon's €746M, multiple sub-billion fines for major US firms — and the enforcement has become more technical, asking specifically what data flowed into training, what consent existed, and how deletion is implemented. Privacy in ML is not just a technical research area; it is a compliance precondition.

Individual harm, individual remedy

Where a fairness failure typically harms a group statistically, a privacy failure typically harms an identifiable individual concretely. A membership-inference attack that confirms a person was in a hospital's diabetes-prediction training set is a disclosure of that person's diabetes status. A training-data extraction attack that recovers a verbatim chunk of a customer's email from a deployed LLM is a disclosure of that email. Both are individually-attributable harms — and the legal liability follows the individual, not a population. The operational consequence: privacy claims must be defensible at the individual level, with documented mitigations and a clear contestation pathway.

The three-pillar stack — what the attacker learns, what we deploy to stop them, and what we contractually promise. The arrows are causal: a threat model in §2 selects a defence in §3–7, paired with a deletion / governance contract in §8–9. The deployment context determines which combination matters.

The privacy-utility trade-off

Every privacy defence has a cost. Differential privacy adds noise that hurts accuracy. Federated learning hurts convergence. Homomorphic encryption multiplies inference cost by orders of magnitude. The operational discipline is to be honest about the trade-off: a privacy claim with no measurable accuracy or latency cost is almost always a privacy claim that does not survive contact with a competent adversary. The mature programs explicitly publish their privacy-utility curves alongside their model cards.

The Threat Landscape

Five attack families dominate the literature on privacy in ML. Each has a different access pattern, a different concrete harm, and a different defence profile. Knowing which attack matters for a given deployment is the first step in any privacy program.

Membership inference

Membership inference (Shokri et al., 2017) asks the simplest possible question: was a given record \(x\) part of the model's training set? The attacker queries the model on \(x\) and on neighbouring inputs and uses the difference in confidence — trained-on records typically receive higher confidence than unseen ones — to infer membership. The attack is cheap (black-box queries only), reliable (modern attacks reach AUC of 0.7–0.95 against undefended models), and concretely harmful: confirming someone is in a "diabetes diagnostic model" training set discloses their diabetes status. Membership inference is the standard adversarial benchmark for differential-privacy claims; if your DP defence cannot beat a strong membership-inference attack, the formal guarantee is vacuous.

Attribute inference

Attribute inference goes further: given partial information about an individual, predict the value of an unobserved sensitive attribute. The classic Fredrikson et al. (2015) attack on a pharmacogenetics model recovered patient genetic information from black-box queries plus public demographic data. Attribute-inference attacks exploit the fact that models trained on correlated features can be queried to reveal those correlations to an attacker who has only some of the features. The defence is partly differential privacy (it bounds the increase in posterior knowledge from any individual record) and partly minimisation (don't train models with sensitive features unless the deployment requires them).

Model inversion

Model inversion attacks aim to reconstruct training examples from the trained weights or from query access. Fredrikson et al. demonstrated face-recognition models could be inverted to reveal recognisable faces of training-set individuals. Carlini, Liu et al. (2019) showed image classifiers and language models could be inverted to recover training examples. The attack is most successful for high-capacity models trained on small or duplicated data; mitigations include differential privacy, deduplication, and limiting the precision of model outputs. Carlini's Extracting Training Data from Large Language Models (2021) made model inversion concrete for foundation models — see §10.

Training-data extraction from LLMs

The 2021–2026 work on training-data extraction showed that large language models can be prompted into emitting verbatim chunks of their training data — including PII, copyrighted text, and unique sequences. Carlini, Tramèr et al. (2021) extracted hundreds of verbatim sequences from GPT-2; subsequent work (Nasr et al., 2023; Carlini et al., 2023) extracted from production-scale models including GPT-3.5, ChatGPT, and PaLM. Extraction is exacerbated by data duplication in training corpora; deduplication is a partial mitigation, differential privacy is a stronger one, and rate-limiting plus output filtering is the deployment-time control.

Model stealing

Model stealing (Tramèr et al., 2016) attacks aim to clone a paid API into a local copy. The attacker queries the API repeatedly, treats the queries-and-answers as a training set, and trains a local surrogate. Model stealing has both privacy implications (the surrogate can be inverted to leak training data) and intellectual-property implications. Defences combine watermarking, query-rate limits, output perturbation, and (most aggressively) suing the attacker — recent litigation around DeepSeek's distillation of OpenAI APIs exemplifies the legal frontier.

Side channels and the rest

Beyond the canonical five, privacy in ML faces side-channel attacks (timing, power, cache), poisoning attacks that engineer training-data leaks, and the rapidly-evolving threat from multi-modal attacks that combine different access patterns. The mature program models the attacker explicitly and uses defence-in-depth: differential privacy plus federated learning plus output filtering plus query-rate limits, each contributing a partial guarantee.

Differential Privacy: The Formal Guarantee

Differential privacy (DP), introduced by Dwork et al. (2006), is the only privacy framework in mainstream use that gives a precise mathematical guarantee about what an attacker can learn — regardless of their auxiliary information. Two decades on, it is the operational standard for high-stakes ML and the regulatory reference point for many privacy claims.

The definition

A randomised algorithm \(\mathcal{M}\) is \(\epsilon\)-differentially private if, for any two datasets \(D, D'\) differing in a single record and any output set \(S\): \(P(\mathcal{M}(D) \in S) \leq e^{\epsilon} \cdot P(\mathcal{M}(D') \in S)\). The intuition: the output of the algorithm should be statistically nearly indistinguishable whether or not any particular individual is in the dataset. \(\epsilon\) is the privacy budget: smaller is more private. The relaxed \((\epsilon, \delta)\)-DP variant allows the bound to fail with probability \(\delta\), which is operationally necessary for many practical mechanisms (Gaussian noise, in particular).

What \(\epsilon\) means in practice

\(\epsilon = 0\) is perfect privacy (output is independent of any individual). \(\epsilon \to \infty\) is no privacy. Practical deployments use \(\epsilon\) values from 0.1 (Apple's keyboard typing data) to about 8 (the looser end of acceptable for many DP-SGD deployments). The interpretation is roughly Bayesian: \(\epsilon = 1\) means the attacker's posterior probability about any individual's record can change by at most a factor of \(e \approx 2.7\) — a meaningful but not enormous information gain. Higher \(\epsilon\) means a larger possible posterior shift; the practitioner must justify the choice against the expected utility benefit and the attacker's prior knowledge.

Global vs local DP

Global DP (also called central DP) noises a function computed over the whole dataset by a trusted central party — the noise is added once, the budget is spent globally, and accuracy is high for a given privacy level. Local DP noises each individual's data before it leaves the user's device — no trust assumption is required, but the noise compounds and accuracy is much worse. Apple's iOS keyboard uses local DP because Apple does not want to be a trusted curator of users' typing data; Google's RAPPOR was an early local-DP system. Most DP-trained ML models use global DP because the accuracy hit from local DP is usually too large for production-quality models.

Composition: the key operational property

The most powerful operational property of DP is composition: if you run two \(\epsilon_1\)- and \(\epsilon_2\)-DP analyses on the same data, the combined release is \((\epsilon_1 + \epsilon_2)\)-DP. Tighter bounds (advanced composition, Rényi-DP, the moments accountant) allow more queries for the same nominal budget, but the principle is the same: the privacy budget is a finite resource that gets consumed by every release. A production DP system needs a privacy accountant (Mironov, 2017; Abadi et al., 2016) that tracks budget consumption across training, hyperparameter tuning, model updates, and downstream releases.

The privacy contract

The output of a DP analysis is a release plus a documented \((\epsilon, \delta)\) pair. The deployment-time discipline: publish the privacy parameters in the model card, document the accounting method, and make the budget exhaustion behaviour explicit (does the system stop releasing? throttle? require re-authorisation?). This is the privacy analogue of fairness's documented trade-off (Ch 06): a number on a model card is the basis of an external claim. The 2024–2026 maturity has been treating this contract as enforceable — privacy regulators increasingly ask to see the accountant's logs.

DP Mechanisms and the Noise Calculus

Differential privacy is achieved by adding calibrated noise to whatever quantity is being released. The amount of noise depends on the sensitivity of the function — how much the output can change from a one-record change in the input. The two foundational mechanisms are the Laplace mechanism for pure \(\epsilon\)-DP and the Gaussian mechanism for \((\epsilon, \delta)\)-DP; everything else builds on these.

The Laplace mechanism

For a real-valued function \(f\) with \(L_1\) sensitivity \(\Delta_1 f\) (the maximum change in \(f\) from a one-record change), releasing \(f(D) + \text{Lap}(\Delta_1 f / \epsilon)\) is \(\epsilon\)-DP. The Laplace distribution has heavier tails than the Gaussian, which is why pure \(\epsilon\)-DP is achievable. The mechanism is the workhorse for low-dimensional summary statistics: counts, means, histograms.

The Gaussian mechanism

For a function with \(L_2\) sensitivity \(\Delta_2 f\), releasing \(f(D) + \mathcal{N}(0, \sigma^2 I)\) with appropriate \(\sigma\) is \((\epsilon, \delta)\)-DP. The Gaussian mechanism is the foundation of DP-SGD because gradient updates are vectors with bounded \(L_2\) norm (after clipping); the analysis is cleaner and the accumulated noise from many steps composes more tightly under the moments accountant. The cost is the \(\delta\) relaxation: there is a small probability that the privacy guarantee fails for a given output.

Sensitivity and clipping

Most ML functions do not have bounded sensitivity by default — a single record can change a gradient arbitrarily. The standard fix is clipping: bound the per-record contribution by enforcing \(||g_i||_2 \leq C\) for some clipping threshold \(C\). After clipping, the sensitivity is \(C\) and the noise calibration is straightforward. The trade-off: larger \(C\) preserves more gradient information but requires more noise to achieve the same privacy; smaller \(C\) hurts the signal. Tuning \(C\) is the central practical lever in DP-SGD — see §5.

Privacy accounting and the moments accountant

Naive composition of \(T\) Gaussian-mechanism queries gives \((T\epsilon, T\delta)\)-DP — too loose for any real ML training run. The moments accountant (Abadi et al., 2016) and its successor Rényi-DP accountant (Mironov, 2017; Wang et al., 2019) compute much tighter bounds by tracking moments of the privacy loss random variable. These give the privacy budgets reported by Opacus, TensorFlow Privacy, and the JAX-based DP libraries. The accountant is where the production privacy claim is made; getting it wrong (e.g., not accounting for hyperparameter search) silently inflates the real \(\epsilon\) by orders of magnitude.

Subsampling amplification

A key practical lever: random subsampling of training batches amplifies the privacy guarantee. If a record is included in a batch with probability \(q\), the per-step privacy cost is reduced by approximately \(q\). This is why DP-SGD subsamples its mini-batches with Poisson sampling; without amplification, the budget would be exhausted in the first epoch.

Beyond Laplace and Gaussian: PATE and the exponential mechanism

The exponential mechanism (McSherry & Talwar, 2007) handles non-numeric outputs (categorical choices, model selection) by sampling from a distribution weighted by a utility function. PATE (Papernot et al., 2017) trains an ensemble of teacher models on disjoint data shards and uses noisy aggregation of their predictions to train a student model — the privacy guarantee comes from the noisy voting, not from the model gradients. PATE is competitive with DP-SGD for some tasks and gives a different deployment profile (the student can be released without DP guarantees because it never sees the raw teacher data).

DP-SGD and Privacy-Preserving Training

DP-SGD (Abadi et al., 2016) is the workhorse of differentially-private deep learning. It modifies stochastic gradient descent by clipping per-example gradients and adding calibrated Gaussian noise before the parameter update. Combined with the moments accountant and subsampling amplification, it produces models with end-to-end DP guarantees that are state-of-the-art for most production tasks.

The algorithm

For each step: (1) sample a mini-batch with Poisson sampling at rate \(q\); (2) compute per-example gradients \(g_i\) for each example in the batch; (3) clip each gradient: \(\bar g_i = g_i / \max(1, ||g_i||_2 / C)\); (4) sum the clipped gradients and add Gaussian noise: \(\tilde g = (\sum_i \bar g_i + \mathcal{N}(0, \sigma^2 C^2 I)) / B\) where \(B\) is the expected batch size; (5) update parameters: \(\theta \leftarrow \theta - \eta \tilde g\). The privacy accountant tracks \((\epsilon, \delta)\) over the full training run.

Practical recipes

The 2020–2026 practical literature has converged on a set of recipes that close most of the early DP-SGD accuracy gap. (1) Large batches: DP-SGD favours batch sizes 4–16× larger than non-private training because the noise averages down with batch size. (2) Pretraining + DP fine-tuning: pretrain on public data, then fine-tune with DP on private data; the public pretraining absorbs most of the representation cost. (3) Group normalisation: replace batch normalisation (which leaks via the running statistics) with group or layer normalisation. (4) Adaptive clipping: tune the clipping threshold dynamically. (5) Larger models: counterintuitively, larger pretrained models often DP-fine-tune better than smaller ones at the same target \(\epsilon\) (Li et al., 2022). The 2025–2026 LLM-DP work has further closed the gap with techniques like LoRA-DP (DP fine-tuning of low-rank adapters only) and gradient compression schemes.

Production libraries

Opacus (PyTorch) and TensorFlow Privacy are the dominant libraries; both provide accountant-aware optimisers, per-example gradient computation, and clipping primitives. JAX-based JAX-Privacy is faster for very large models. The library code is one component; the operational discipline is auditing the accountant configuration, the random-number-generator seeding (DP can be broken by predictable randomness), and the gradient-leak channels (e.g. via gradient accumulation across micro-batches that breaks the Poisson sampling assumption).

Federated DP and DP-FedAvg

When training is federated (§6), DP can be applied at the user level rather than the example level: the unit being protected is the user, not the individual gradient step. DP-FedAvg (McMahan et al., 2018) implements this by clipping per-user updates and adding noise at the central server. The privacy guarantee is per-user-DP, which is the more meaningful unit for many real deployments — Google's Gboard keyboard predictions are trained this way.

What DP-SGD does and does not protect

DP-SGD provides a per-record privacy guarantee against an attacker who has access to the trained model (white-box) and can query it (black-box). It does not protect against attacks on the training infrastructure (data leakage during training), the data-collection pipeline (if the data was unlawfully collected, DP doesn't help), or the deployment context (if inputs at inference time leak via logging, DP only protects training data). The mature programs combine DP with infrastructure security (Ch 06 of Part XIV on cybersecurity) and data-minimisation discipline.

Federated Learning and Decentralised Training

Federated learning (FL) trains a model across many devices without centralising the training data. Each device computes a local update on its own data and sends only the update — typically a gradient or weight delta — to a central server, which aggregates updates from many devices into a global model. FL is now the standard architecture for cross-device on-device ML and increasingly for cross-organisation ML in healthcare and finance.

FedAvg and the algorithmic foundation

FedAvg (McMahan et al., 2017) is the foundational algorithm: each round, the server selects a subset of clients, broadcasts the current model, each client trains for a few local epochs, and the server averages the resulting weights. The communication cost is the bottleneck — each round transfers a full model — and the heterogeneity of client data (non-IID) is the convergence challenge. The 2017–2026 literature has produced refinements: FedProx (proximal regularisation), FedOpt (server-side adaptive optimisers), and personalised FL methods that fine-tune a per-client head on top of the shared backbone.

The privacy-by-architecture argument and its limits

FL is often advertised as privacy-preserving by architecture: raw data never leaves the device. This is true at the architectural level but not at the cryptographic-attacker level. Gradient updates leak information: the gradient of the loss with respect to the parameters depends on the training examples in well-understood ways. Gradient inversion attacks (Zhu et al., 2019; Geiping et al., 2020) recover training images and texts from intercepted gradients with surprising fidelity. The defensive lesson: FL is a useful component of a privacy program, but it is not a privacy guarantee on its own. It must be combined with secure aggregation, DP, or both.

Secure aggregation

Secure aggregation (Bonawitz et al., 2017) uses cryptographic protocols (additive secret-sharing across pairs of clients) so that the server only sees the aggregated update, not any individual client's update. Combined with FL, this means an honest-but-curious server cannot mount gradient-inversion attacks on individuals. Production systems (Google's Federated Analytics, Apple's private federated learning) use secure aggregation as the default. The cost: communication overhead grows quadratically with the number of clients in the simplest constructions; recent work has reduced this to near-linear at modest constant-factor overhead.

Cross-silo vs cross-device FL

Cross-device FL (millions of phones) and cross-silo FL (tens of hospitals) are different architectural regimes. Cross-device favours communication-efficient algorithms, robust client-dropout handling, and aggressive subsampling. Cross-silo favours per-organisation accounting, contract-based governance, and explicit data-sharing agreements. The privacy claims differ: cross-device naturally aligns with user-level DP; cross-silo needs explicit organisational privacy contracts and sometimes per-record DP.

FL in production: 2024–2026 maturity

Apple's iOS keyboard, Google's Gboard, Apple's image-classification on photos, healthcare-consortium models for medical imaging, and an increasing number of cross-bank fraud-detection models are now FL-trained in production. The 2024–2026 ecosystem (TensorFlow Federated, Flower, NVIDIA FLARE, the FedML platform) has matured to make this routine. The remaining frontier work is Byzantine robustness (defending against malicious clients), heterogeneous-architecture FL (clients with different model capacities), and FL for large foundation models, where the communication cost is the binding constraint.

Secure Multiparty Computation and Homomorphic Encryption

Differential privacy is statistical: it bounds what an attacker can infer. Cryptographic privacy is computational: it ensures the attacker cannot see the data at all. The two cryptographic primitives in production use are secure multiparty computation (SMPC) and homomorphic encryption (HE), with trusted execution environments (TEEs) as a hardware-assisted middle ground.

Secure multiparty computation

SMPC protocols let multiple parties compute a function over their joint inputs without revealing the inputs to each other. The classical primitives are Yao's garbled circuits (boolean computation) and additive secret-sharing (arithmetic computation). For ML, the dominant frameworks are CrypTen (Meta), MP-SPDZ, and SecretFlow. SMPC supports both private inference (the user holds inputs, the server holds the model, neither reveals their part) and private training (multiple parties hold disjoint training data and jointly train a model). The cost is communication-bound: SMPC training is typically 10–100× slower than plaintext training and requires high-bandwidth low-latency links between parties.

Homomorphic encryption

Homomorphic encryption lets the server compute on encrypted data and return an encrypted result, without ever decrypting. Fully homomorphic encryption (FHE) supports arbitrary circuits; somewhat homomorphic and levelled schemes support bounded-depth circuits with much better performance. The dominant ML library is Microsoft SEAL with the CKKS scheme for approximate arithmetic on real numbers. HE inference is slow: 1000–100,000× the plaintext cost depending on circuit depth and security parameters. The use cases are narrow but real: encrypted inference for genomics, encrypted credit scoring, encrypted fraud detection where the cost is acceptable.

Trusted execution environments

TEEs (Intel SGX, AMD SEV, ARM TrustZone, Apple's Secure Enclave, NVIDIA's H100 confidential computing) provide hardware-isolated execution: code runs in a region of memory that the operating system and hypervisor cannot inspect. The performance cost is small (typically 10–30%), and the developer experience is much closer to plaintext computation than SMPC or HE. The downside: trust in the hardware vendor, vulnerability to side-channel attacks (multiple high-profile SGX breaks in 2018–2024), and the requirement that all parties trust the manufacturer's attestation. For many production deployments, the performance-trust trade-off favours TEEs over SMPC/HE.

Crypto-vs-DP trade-offs

The two paradigms answer different questions. DP gives a statistical guarantee against an attacker who sees the model's output; cryptographic methods give a computational guarantee against an attacker who tries to see the inputs. They compose: production systems often run DP-trained models inside TEEs to combine the guarantees. The deployment-time question is which threat is binding: if the attacker has access to the trained weights and queries the API, DP is the right tool; if the attacker is the cloud operator hosting the model, crypto is the right tool; if both, combine.

Production status as of 2026

SMPC-based ML is in production at a handful of consortia (financial fraud detection, healthcare research, ad-attribution). HE-based ML inference is in production at narrow specialty deployments (encrypted credit decisioning, some genomic services). TEE-based ML is now widely deployed: NVIDIA's H100 and B100 confidential computing, Apple's Private Cloud Compute (announced 2024), and the major cloud providers' confidential-VM offerings make TEE-protected inference routine. The frontier is integrating these primitives into the standard ML stack so that practitioners can pick a privacy level by configuration rather than by re-engineering.

Machine Unlearning and the Right to Be Forgotten

GDPR Article 17 — the right to be forgotten — gives EU residents the right to demand deletion of their personal data. Applied to ML, it raises the question: if a person's data was used to train a model, must the model "forget" them when their data is deleted? Machine unlearning is the technical discipline of doing exactly that — removing a record's influence from a trained model without full retraining.

The problem

Naive deletion is easy: drop the record from the training database. But the trained model has already absorbed the record's influence into its weights. Re-training from scratch on the deleted dataset would work but is prohibitively expensive for large models — often weeks of compute. Machine unlearning aims for the same statistical end-state as full retraining at a small fraction of the cost.

Exact unlearning: SISA and sharded retraining

The most-deployed approach is SISA (Sharded, Isolated, Sliced, Aggregated; Bourtoule et al., 2021): partition the training data into shards, train one model per shard, aggregate predictions at inference. To unlearn a record, retrain only the shard that contained it. The cost is bounded by the shard size, not the full dataset. SISA gives exact unlearning (the post-deletion model is statistically identical to a model trained without the record from scratch); the cost is some accuracy hit from the ensemble vs the single-model architecture.

Approximate unlearning: influence functions and fine-tuning

Influence functions (Koh & Liang, 2017) estimate how the model would change if a single training example were removed; they can be used to update the weights to approximately remove that influence. The technique is much cheaper than retraining but provides only an approximate guarantee — which is fine for many use cases but does not satisfy regulators looking for exact deletion. Fine-tuning-based unlearning methods (continue training on the dataset minus the deleted record, or with negative gradient on the deleted record) are similarly approximate.

Certified unlearning

The frontier is certified unlearning: provable guarantees that the unlearned model is statistically indistinguishable from one that never saw the deleted record. The 2021–2026 literature has produced certified unlearning algorithms for convex models (Guo et al., 2020), DP-trained models (where the DP guarantee directly implies a form of unlearning), and structured ensembles. For non-convex deep learning, exact certified unlearning remains computationally hard; the operational pattern combines SISA-style sharding with periodic full retraining on a regular cadence.

Operational deletion at scale

The contractual side of deletion: a regulated deployment must accept deletion requests, route them to the right systems, execute the deletion in the underlying data store, propagate the deletion to derived features and models, and provide an audit trail. The 2024–2026 maturity around this — sometimes called privacy engineering — has produced patterns for deletion pipelines that integrate with the MLOps stack (Ch 04 of Part XVI on monitoring; Ch 03 on model deployment). The right-to-be-forgotten is no longer a research problem; it is a production engineering discipline with documented architectures.

Unlearning for generative models

The hardest case: unlearning specific facts or memorised content from a foundation model. The 2023–2026 work has produced techniques for targeted forgetting of specific memorised sequences (Eldan & Russinovich, 2023; "Who's Harry Potter?"), copyrighted-content removal, and selective forgetting of demographic-specific content. The techniques work but are imperfect: a sufficiently determined attacker can sometimes restore the forgotten content via prompt engineering, and the unlearning often degrades adjacent capabilities. The frontier is unlearning that is robust to adversarial prompts and that preserves model quality.

Privacy Governance and Regulatory Context

Privacy in ML lives or dies by its governance. The technical mitigations of §3–8 produce the formal guarantees; the governance machinery turns them into enforceable promises. The 2018–2026 regulatory landscape has converged enough that mature programs share a common operational shape, even as the specific rules differ across jurisdictions.

The major regulatory regimes

GDPR (EU, effective 2018) establishes lawful-basis requirements (consent, contract, legitimate interest), data-subject rights (access, rectification, deletion, objection), and substantial fines (up to 4% of global revenue). For ML, the binding requirements are: a lawful basis for using personal data in training, the right to delete on request, and Article 22's right not to be subject to solely automated decisions for significant matters. CCPA / CPRA (California, 2020/2023) provides similar rights for California residents. Sectoral rules — HIPAA (US healthcare), GLBA / FCRA (US finance), FERPA (US education), PIPEDA (Canada), LGPD (Brazil), PDPA (Singapore) — overlay domain-specific requirements. The EU AI Act (effective 2025–2027 in stages) adds an AI-specific layer, including data-governance requirements for high-risk systems.

The DPIA and the AI risk assessment

For high-risk processing, GDPR requires a Data Protection Impact Assessment (DPIA): a documented analysis of the privacy risks, the mitigations, the residual risk, and the ongoing monitoring plan. The EU AI Act's high-risk-system rules require an analogous AI-specific risk assessment. The mature programs combine these into a single integrated assessment per deployment, with explicit documentation of the threat model (§2), the chosen defences (§3–7), the deletion mechanism (§8), and the privacy budget where applicable (§3–4).

Lawful basis and consent in ML training

The pre-existing regulatory frame mostly assumed data was used for the purpose for which it was collected. Foundation-model training has stressed this assumption: training data scraped from the public web for one purpose is now used for a different one. The 2023–2026 enforcement (Italy's brief ChatGPT block, the GDPR investigations of OpenAI and Google, the EU AI Act's training-data transparency rules, multiple US state-level AI privacy laws) has been narrowing the latitude for "we trained on public data". Mature programs now document training-data provenance, opt-out mechanisms, and content-source consent at the level of expected scrutiny.

Privacy-preserving documentation

The privacy-engineering analogue of model cards (Ch 06): documentation that captures data-flow diagrams, lawful-basis claims, retention policies, deletion mechanisms, DP budget if applicable, encryption at rest and in transit, and incident-response plans. The 2024–2026 industry pattern integrates this with the model card and the AIA into a single deployment-time disclosure document. NIST's AI Risk Management Framework, ISO 23894 (AI risk management), and the EU AI Act's documentation requirements all converge on similar structure.

Cross-border data flows

Privacy regulations restrict cross-border data flows: EU data cannot leave the EU without specific safeguards, and EU-US data flows have been litigated repeatedly (Schrems I, Schrems II, the EU-US Data Privacy Framework). For ML, the practical implication is regional model training and inference, federated architectures that keep data in-region, and model-only export (where the trained model is exported but the data is not). The 2024–2026 architecture patterns make this routine — most major cloud providers offer regional model-training and confidential-compute regions.

Incident response and breach notification

When privacy controls fail, the regulatory clock starts: GDPR requires notification within 72 hours of becoming aware of a breach. ML-specific incident response — a model is found to leak training data, an extraction attack succeeds, a deletion request fails to propagate — needs documented playbooks. The 2024–2026 maturity treats this as part of the broader AI-incident-response discipline (Ch 07 of Part XVI on responsible release covers the deployment side).

The Frontier and the Open Problems

Privacy in ML is one of the most rapidly-evolving subfields of AI safety. The 2023–2026 work has reshaped the threat landscape (training-data extraction from LLMs is now demonstrated at scale), the defensive toolkit (DP for foundation models is now competitive), and the regulatory environment (the EU AI Act and the wave of US state-level AI privacy laws). This section surveys the leading edge.

LLM training-data extraction at scale

Carlini et al.'s 2023 paper Scalable Extraction of Training Data from (Production) Language Models showed that aligned production LLMs — GPT-3.5, ChatGPT — could be coerced into emitting verbatim training data with a simple prompting trick. Subsequent work (Nasr et al., 2023; the Stanford "Foundation Model Transparency Index") has documented the breadth of memorised content and the efficacy of various extraction strategies. The defensive response combines deduplication, DP-SGD at fine-tuning time, output filtering, and rate-limiting; none of these is a silver bullet, and the arms race continues. The deployment-time discipline as of 2026 is treating training-data extraction as an active threat to be tested for, not a theoretical concern.

Synthetic data: the great hope and the great limit

One promising direction: train models on synthetic data generated by privacy-preserving methods (DP-trained generative models, simulator-generated data), avoiding the privacy concerns of real data entirely. The 2023–2026 work has produced increasingly capable DP-synthetic-data generators (Lin et al., 2023, on tabular data; Ghalebikesabi et al., 2023, on images). The limits are real: synthetic data is only as good as the privacy-preserving generator that produced it, and the utility-privacy trade-off is often steeper than for direct DP training. For some narrow use cases (tabular data with limited feature interactions), DP-synthetic data is now production-ready; for general-purpose foundation-model pretraining, it is not.

Privacy auditing

The complement to claiming privacy is auditing it. Privacy auditing (Jagielski et al., 2020; Steinke et al., 2023) empirically measures the actual privacy of a model by mounting strong attacks and comparing the results to the claimed \(\epsilon\). Surprisingly often, the empirical privacy is much better than the formal bound suggests — but not always, and auditing is the only way to be sure. The 2024–2026 frontier has produced auditing tools that work for production-scale models and that integrate with DP-SGD training pipelines. Audit-driven privacy is the analogue of the audit-driven fairness discipline of Ch 06.

Privacy for foundation models and agents

Foundation models trained on web-scale data raise privacy questions that the classical framework barely addresses: who is the data subject when the data is a public webpage? What is the privacy budget for a model that will be queried billions of times? How do you implement deletion when the data flowed through a 10-stage pipeline? Agentic systems (Ch 12 of Part XI on agents) raise additional questions: the agent's tool calls can leak information across security boundaries, the persistent memory can accumulate sensitive data, the action space includes privacy-relevant operations. The 2025–2026 frontier work on agent privacy is just beginning.

The limits of formal guarantees

The most pointed open question, paralleling §10 of Ch 06: are formal guarantees the right primary discipline, or are they a sometimes-useful supplement to deeper governance work? A model with \(\epsilon = 8\) DP and a clean deletion pipeline can still feel like a privacy violation if the deployment context is wrong; a model with no formal guarantee can feel privacy-respectful if the governance is right. The mature programs combine formal techniques with consent-and-context engineering, transparency about training-data provenance, and an explicit recognition that privacy is not just a number — it is a relationship between the system, the people whose data it uses, and the society in which it operates.