Part XVIII · AI Safety, Alignment & Governance · Chapter 06

Fairness, Bias & Equity, where models meet the people they affect.

Explainability (Ch 05) tells you why a model made a decision; fairness asks whether the decision pattern is acceptable to the people affected by it. Fairness is harder than explainability because it is partly a technical question (does the model treat groups equivalently on some statistical metric?), partly a normative one (which metric is the right one — and who gets to decide?), and partly a structural one (does the deployed system, end-to-end, reproduce or counteract the social inequities its training data encodes?). The technical literature has matured rapidly: there are well-established group-fairness definitions, individual-fairness formalisms, an impossibility theorem that proves several natural definitions cannot be simultaneously satisfied, and three families of mitigation methods (pre-, in-, and post-processing) with mature open-source libraries. The operational discipline pairs that toolkit with a disaggregated audit, a clear accountability story, and an honest reckoning with which fairness goal the deployment actually serves. This chapter develops the methodology with the depth a working ML practitioner, model-risk officer, or product owner needs.

Prerequisites & orientation

This chapter assumes the deep-learning material of Part VI, the AI safety material of Ch 01, and the explainability material of Ch 05. Familiarity with basic classification metrics (precision, recall, ROC, calibration) is essential; familiarity with causal reasoning (Ch 03 in Part XIII) helps for §4 but is not required. The chapter is written for ML engineers, applied scientists, model risk officers, product managers, and policy staff who need to ship models that affect people in regulated or socially-consequential domains: hiring, lending, criminal justice, healthcare, education, content moderation, and increasingly any consumer-facing AI system.

Three threads run through the chapter. The first is the group-vs-individual distinction: most operational fairness work uses group-level statistics, but those statistics can hide individual-level harms, and a complete program needs both. The second is the impossibility result: several plausible fairness criteria are mutually exclusive, so deployment requires choosing — and documenting — which trade-off you are accepting and why. The third is the technique-vs-governance dimension: technical mitigations matter, but a fair-enough metric inside an unaccountable deployment is not fairness, and the mature programs combine both. The chapter develops each in turn.

In this chapter

Why Fairness Is Different from Explainability normative · stakeholders · technical limits · structural
Sources of Bias: The Methodology Stack historical · measurement · sampling · label · deployment
Group Fairness Definitions and the Impossibility Theorem demographic parity · equalized odds · calibration · the trade-off
Individual and Counterfactual Fairness similar-treatment · Lipschitz · causal models · counterfactuals
Auditing Models for Disparate Impact disaggregated metrics · slicing · intersectionality · the audit playbook
Pre-processing Mitigations reweighting · resampling · fair representations · suppression
In-processing Mitigations constrained optimisation · adversarial debiasing · regularisation
Post-processing Mitigations group thresholds · calibration · reject-option · trade-offs
Documentation, Governance, and Accountability model cards · datasheets · audit reporting · contestability
The Frontier and the Open Problems LLM fairness · dynamic fairness · multi-stakeholder · the limits of metrics

Why Fairness Is Different from Explainability

Explainability (Ch 05) is descriptive: it answers why did the model make this prediction? Fairness is normative: it asks is the pattern of predictions acceptable, and to whom? The distinction matters because the methods, the stakeholders, and the limits of the technical work are all different. An explainable model can still be unfair; a fair model can still be inscrutable. Practitioners conflate the two at their peril — regulators increasingly do not.

Normative weight, not just statistical disparity

A model that approves loans for 80% of applicants in group A and 60% in group B exhibits a statistical disparity. Whether that disparity is unfair depends on questions outside the data: does the disparity reflect a genuine difference in repayment ability, a measurement artefact, a labelling decision, a sampling pattern in the training data, or a residue of historical discrimination encoded in the features the model uses? Different answers point to different remedies and different ethical stances. The technical literature can quantify the disparity precisely; it cannot, on its own, tell you whether the disparity is wrong. The discipline of fairness-aware ML is the discipline of being clear about which question you are answering.

Stakeholders multiply

Where explainability has at most a handful of typical audiences (developers, model-risk officers, individual end-users, regulators), fairness implicates the affected populations as a stakeholder class — often without their direct participation in the modelling process. The fairness literature has, since the mid-2010s, treated community input, legitimacy, and contestability as part of the methodology, not as soft optional extras. A fair-by-numbers model that affected communities reject is not a fair deployment in any operational sense.

The three-pillar stack — where bias enters, what we measure, and what we do — plus the deployment layer that determines which trade-offs matter. The arrows are causal: a fairness goal at deployment selects definitions in §3–4, which in turn determine which mitigations in §6–8 are appropriate, given the bias sources in §2.

What the technical methods can and cannot deliver

Technical fairness methods can: quantify group-level disparities, enforce specific statistical parities at training or post-processing time, surface affected sub-populations, document choices for auditors, and reduce many obvious failure modes. They cannot: settle which definition is right, repair upstream measurement problems, replace community input, or fix social inequities encoded in the labels themselves. A model that outputs defendant likely to reoffend trained on arrest data from communities differentially policed will produce disparate predictions even after every algorithmic mitigation; the structural fix lies upstream of the model. Knowing where the technical methodology ends and the structural work begins is a core operational skill.

Sources of Bias: The Methodology Stack

Before any fairness metric makes sense, you need a clear-headed taxonomy of where bias enters the pipeline. The literature has converged on five major entry points: historical bias, measurement bias, sampling bias, label bias, and deployment-feedback bias. Each is mitigated differently; conflating them is the most common mistake in fairness audits.

Historical & structural bias

Historical bias is present in data even when the data accurately reflects the world: the world itself is unequal. A hiring model trained on a company's historical hires will learn that women are under-represented in engineering, even if that under-representation reflects past discrimination rather than ability. The data is accurate but the patterns are structurally biased. Suppressing the gender feature does not help: many features (resume keywords, school names, hobbies) are correlated with gender, so the model can reconstruct the protected attribute from proxy features. Historical bias is the hardest to mitigate because the fix often requires changing the deployment goal, not the model.

Measurement bias

Measurement bias arises when the features the model uses are imperfect proxies for the construct of interest, and the imperfection is correlated with group membership. Arrests are a noisy proxy for criminal activity, and that noise is heavily group-correlated. Customer-service complaint volume is a noisy proxy for service quality, and that noise is correlated with the demographics of who complains and how. Standardised test scores are noisy proxies for aptitude, with well-documented group differences in the noise. Measurement bias is partially fixable by improving the measurement (using validated outcomes rather than proxies), but the operational discipline is to be explicit about which proxy is in use.

Sampling & representation bias

Sampling bias arises when training data over- or under-represents groups. The classic case is the original face-recognition models trained on largely-white-male photo datasets, which produced dramatically worse error rates for darker-skinned women — Buolamwini and Gebru's Gender Shades made this concrete in 2018 and reshaped industry practice. Sampling bias is partially fixable by re-sampling and by collecting better data, but it requires a clear-eyed audit of who is in the training set and who isn't.

Label bias and aggregation

Label bias arises when the labels themselves encode discriminatory decisions. A "good employee" label assigned by a manager who has historically promoted men more than equally-qualified women carries that bias forward. Aggregation bias is a sub-case: pooling labels across populations that should be treated as separate distributions. Pooling clinical-trial outcomes across self-reported races when the underlying physiology actually differs across populations produces a model that's wrong for everyone, in different ways. Label bias is mitigated by examining the label-generation process, by using outcome-based rather than judgement-based labels, and sometimes by training group-specific sub-models.

Deployment-feedback bias

Deployment-feedback bias arises after the model ships. A predictive policing model that sends officers to neighborhoods previously flagged generates more arrests there, which become training data for the next iteration, which sends more officers — the model entrenches its own predictions. Recommender systems produce filter bubbles in the same way. The mitigation requires offline-online reconciliation, explicit exploration, and counterfactual logging — see Ch 04 of Part XVI on monitoring and Ch 06 of Part XVI on A/B testing for the operational machinery.

Group Fairness Definitions and the Impossibility Theorem

Most operational fairness work uses group fairness: comparing statistical properties of model output across protected groups. The literature has crystallised on a handful of core definitions, and a sharp impossibility theorem proves that several natural ones cannot be simultaneously satisfied except in degenerate cases. This means fairness is, irreducibly, a choice — and shipping a model in a regulated domain requires documenting which choice you made.

The four canonical group-fairness definitions

Let $\hat Y$ be the model's prediction (1 for "positive", 0 for "negative"), $Y$ the true outcome, and $A$ the protected attribute (taking values $a$ and $b$ for two groups). The four most-cited definitions are:

Demographic parity (also called statistical parity): $P(\hat Y = 1 \mid A = a) = P(\hat Y = 1 \mid A = b)$. The positive-prediction rate is the same across groups. Useful when the goal is equal-rate output regardless of differential base rates — e.g. equal-rate referrals for a screening program. Problematic when base rates legitimately differ.

Equalized odds: $P(\hat Y = 1 \mid Y = y, A = a) = P(\hat Y = 1 \mid Y = y, A = b)$ for both $y \in \{0, 1\}$. Equal true-positive and false-positive rates across groups. Useful when the goal is equal accuracy of the prediction for genuinely-positive and genuinely-negative individuals.

Equal opportunity: a relaxation of equalized odds — only the true-positive rate must be equal across groups. Useful when false negatives are costlier than false positives (e.g. missing a genuine credit-worthy applicant) and you want equal access across groups for the positive outcome.

Calibration within groups (also called predictive parity): $P(Y = 1 \mid \hat S = s, A = a) = P(Y = 1 \mid \hat S = s, A = b)$ where $\hat S$ is the model's predicted probability. A predicted probability of 0.7 should mean the same empirical rate of positives across groups. Useful when the score is consumed downstream as a probability — e.g. risk scores in clinical decision support.

The impossibility result

Chouldechova (2017) and Kleinberg et al. (2016) independently showed that when base rates differ across groups, calibration and equalized-odds (or equal opportunity) cannot both hold except in trivial cases (perfect prediction, or equal base rates). The proof is short and crisp; the implications are not. The COMPAS recidivism scoring controversy (ProPublica 2016) is the canonical real-world manifestation: COMPAS was approximately calibrated within groups, but its false-positive rate was substantially higher for Black defendants — and these two facts are mathematically forced to coexist whenever recidivism base rates differ across groups, given any non-perfect classifier. The impossibility result means deployment requires choosing a fairness criterion. There is no "fair model"; there are models fair-by-X.

The trade-off space

Practical decisions hinge on the consequence asymmetry. For lending: false-negatives (denying credit-worthy applicants) and false-positives (extending credit to risky applicants) have different costs to lender and borrower; equal-opportunity-style fairness centers the borrower harm. For risk-assessment in criminal justice: false-positives (predicting reoffense for someone who would not) carry liberty costs; equalized-odds-style fairness centers that cost. For medical screening: calibration matters because clinicians act on the probability. The deployment-time question is which harm matters most, to which stakeholder, and at what scale — and the answer constrains which fairness criterion is the right operational target.

Beyond binary outcomes

Most production work involves multi-class classification, ranking, or continuous scores. Demographic parity generalises to group-conditional output distributions; equalized odds generalises to group-conditional confusion matrices; calibration generalises to group-conditional reliability diagrams. The impossibility result generalises too: when group base rates differ, the same family of trade-offs forces the same family of choices. The 2024–2026 literature has pushed into multi-attribute and intersectional fairness, where the protected attribute has multiple dimensions (race × gender, age × disability) — see §10 for the frontier.

Individual and Counterfactual Fairness

Group-fairness metrics can be satisfied while individuals within a group are treated badly. Individual fairness (Dwork et al., 2012) and counterfactual fairness (Kusner et al., 2017) provide complementary frameworks for reasoning at the per-individual level. They are conceptually attractive but operationally demanding; mature programs use both alongside group metrics.

Individual fairness: similar individuals, similar treatment

Dwork's individual fairness formalism asks that any two individuals who are similar with respect to the task should receive similar predictions. Concretely: there exists a task-specific similarity metric $d(x_1, x_2)$ on inputs and a similarity metric $D$ on outputs, and the model $f$ is fair if $D(f(x_1), f(x_2)) \leq L \cdot d(x_1, x_2)$ — a Lipschitz condition. This is intuitively the closest formal analogue of "treat like cases alike". The hard problem is constructing the task-specific similarity metric $d$: two applicants for the same job, identical except for race, are presumably similar; two applicants in different fields with the same name are not. Defining $d$ requires substantive domain judgement and is itself a normative act.

Counterfactual fairness: causal reasoning

Kusner et al.'s counterfactual fairness uses the language of structural causal models: a prediction is counterfactually fair if it would be the same in a counterfactual world where the individual's protected attribute were different (and downstream effects of the attribute had also been changed accordingly). Formally, for a causal model with variables including $A$ (protected attribute), $X$ (features), and counterfactual variables $X_{A \leftarrow a'}$, the prediction $\hat Y$ is counterfactually fair if $P(\hat Y_{A \leftarrow a} = y \mid X = x, A = a) = P(\hat Y_{A \leftarrow a'} = y \mid X = x, A = a)$ for all $a, a', x, y$. This is the most rigorous fairness criterion available; it requires you to write down a causal model of how $A$ influences $X$, which is itself a substantive scientific commitment. See Pearl's causal-inference machinery for the underlying tools (Ch 03 of Part XIII).

Path-specific counterfactual fairness

Sometimes the protected attribute has both legitimate and illegitimate causal pathways to the outcome. Gender's causal effect on a hiring decision via direct discrimination is illegitimate; its effect via job role choice may or may not be, depending on social and policy commitments. Path-specific counterfactual fairness (Chiappa, 2019) lets the practitioner accept some causal paths and block others by intervening on specific edges in the causal graph. This is technically demanding and requires high-quality causal models, but it is the most expressive operational fairness framework available — and increasingly cited in regulatory guidance for high-stakes systems.

When to use which

Individual fairness with a well-justified similarity metric is best when there is a clear task-specific notion of "the same situation". Counterfactual fairness is best when you can write down a credible causal model and want to reason about discrimination explicitly. Both should be paired with group-fairness metrics for operational deployment: group metrics catch population-level disparities, individual/counterfactual frameworks catch the failures that group metrics hide. The practitioner's discipline is to be honest about the assumptions each method requires, and to document them in the model card.

Auditing Models for Disparate Impact

A fairness audit is the operational instrument that converts the abstractions of §3–4 into deployable evidence. The mature audit playbook has stabilised in the late 2020s around disaggregated metrics, slicing analysis, intersectional decomposition, and subgroup discovery. This section is the working practitioner's how-to.

Disaggregated metrics — the foundation

The simplest and most powerful audit step: compute every model performance metric (accuracy, precision, recall, AUC, calibration error, false-positive rate, false-negative rate) separately for each protected-attribute subgroup. A model with 92% overall accuracy and 95%/85% accuracy across two groups is hiding a 10-point gap that the aggregate metric obscures. Disaggregated metrics are the disclosure unit for model cards and audit reports; production fairness libraries (Fairlearn, AIF360, the What-If Tool) automate the computation and visualisation.

Slicing and the long tail of disparities

Beyond protected attributes, models can fail on slices defined by deployment context: country, language, time-of-day, device type, sensor type, model version. Slicing analysis evaluates the model on each slice; subgroup discovery automatically searches for slices on which performance is anomalous. The 2024–2026 production tooling has matured to make this routine: tools like SliceLine, Fairlearn's MetricFrame, and the Captum-Slice family let practitioners scan thousands of slices automatically. The output is a long-tail report: 95% of slices fine, 5% materially worse — with the practitioner's job being to decide which of those 5% reflects fairness concerns versus benign distribution differences.

Intersectional decomposition

Buolamwini and Gebru's Gender Shades made vivid that single-axis disaggregation can miss the worst harms. Face-recognition error rates for darker-skinned women were dramatically higher than for either dark-skinned men or light-skinned women separately — the intersectional cell was the failure mode. Modern audits compute metrics over intersectional cells (race × gender × age, e.g.), and the production tooling supports it. The operational caveat: intersectional cells can be small enough that estimates are noisy, and the practitioner must distinguish genuine disparate impact from sampling noise — typically with confidence intervals from bootstrap or Bayesian methods.

The audit playbook

A production audit follows a tight checklist: (1) define protected attributes and the legal/ethical basis for them; (2) compute disaggregated metrics for every primary task metric; (3) run slicing and subgroup discovery to surface unexpected disparities; (4) compute intersectional metrics for the most-relevant cell decompositions; (5) compare against baseline (current production model, simple-model baseline, human-decision baseline if available); (6) compute confidence intervals; (7) document each finding in the model card with the disposition (mitigated, accepted-with-rationale, deferred to next iteration); (8) re-run the audit on every deployment cycle. The output of this discipline is not a "fair" or "unfair" verdict but a documented record of what was checked, what was found, and what was done about it.

Auditing without protected attributes

Many deployment contexts cannot collect or use protected-attribute labels at inference time (privacy, regulation). Several methods — proxy-based audits, demographically-balanced evaluation sets, and adversarial auditing with classifiers that try to predict the attribute from the features — let you audit anyway. The 2025–2026 work has produced practical guidelines: when the proxy is reliable enough, proxy-based audit is fine; when it isn't, demographically-balanced eval sets give bounds. Privacy-preserving audit techniques (local differential privacy, secure aggregation) let you compute disaggregated metrics without ever centralising the protected attribute.

Pre-processing Mitigations

Pre-processing mitigations modify the training data before the model sees it. They are appealing because they are model-agnostic and produce a debiased dataset that any downstream model inherits. The downside: they fight one of the strongest forces in ML — the model will recover any signal in the residual features that proxies for the protected attribute.

Reweighting and resampling

Reweighting assigns weights to training examples so that the weighted distribution achieves demographic parity (or another target) on the labels. Concretely: examples in under-represented (group, label) cells are upweighted, over-represented cells are downweighted. This is a single line of code (Kamiran & Calders, 2012) and works particularly well when the bias is sampling-based. Resampling achieves the same effect by replicating or downsampling examples; it interacts more cleanly with batch-size sensitivity in deep learning.

Fair representations

Fair representations (Zemel et al., 2013) learn an encoding $z = E(x)$ that preserves task-relevant information while obfuscating the protected attribute. The encoder is trained with a multi-objective loss: maximise predictive accuracy of the downstream task, minimise an adversary's ability to predict the protected attribute from $z$. The output $z$ can then be used by any downstream model. The technique generalises to conditional fair representations (Madras et al., 2018) and to contrastive fair embeddings (2024–2026 literature). The chief limitation is that downstream models can still recover the protected attribute from combinations of features in $z$ that the adversary did not learn to detect; the operational discipline is to test the downstream pipeline end-to-end, not just the encoder.

Suppression and disparate-impact remediation

Suppression simply drops the protected attribute from the feature set. As discussed in §2, this is generally insufficient — the model can reconstruct $A$ from proxies. Disparate-impact remediation (Feldman et al., 2015) goes further: it transforms the residual features so that their distributions are matched across groups, removing not just the protected attribute but its proxy structure. The transformation can be tuned (a partial-remediation parameter) so that some predictive accuracy is preserved. The trade-off curve — fairness vs accuracy as a function of the remediation strength — is a useful disclosure for the model card.

When pre-processing is the right choice

Pre-processing is best when: (a) you control the data pipeline and downstream models are diverse; (b) the bias is dominantly sampling- or measurement-based and reweighting/resampling has clear targets; (c) the audit budget allows for end-to-end re-evaluation after the data transformation. It is poorly-suited when: (a) labels are themselves biased (relabelling is more direct); (b) the deployment requires specific group-fairness guarantees that pre-processing can only approximate; (c) the downstream model is fixed and high-capacity, in which case in- or post-processing methods give tighter guarantees.

In-processing Mitigations

In-processing mitigations modify the model's training objective itself, adding fairness constraints alongside the standard loss. They give the tightest formal guarantees but are model-specific and can hurt convergence; the literature has matured to make them practical for both classical ML and deep learning.

Constrained optimisation

The classical formulation: minimise the standard loss subject to a fairness constraint such as $|P(\hat Y = 1 \mid A = a) - P(\hat Y = 1 \mid A = b)| \leq \epsilon$. Agarwal et al. (2018) showed how to reduce this to a sequence of cost-sensitive classification problems via a Lagrangian saddle-point formulation, giving the basis for the Fairlearn library's ExponentiatedGradient reduction. The technique works for any classifier that supports cost-sensitive training, which is most of them. It produces a randomised classifier (a distribution over deterministic classifiers) that satisfies the constraint in expectation; deterministic versions (the dominant point on the Pareto frontier) are typically used in deployment.

Adversarial debiasing

The deep-learning analogue: train the predictor $f(x)$ and an adversary $g$ that tries to predict the protected attribute from $f$'s representations. The predictor's loss combines task accuracy with adversary loss-ascent (the predictor wants to fool the adversary). Zhang et al. (2018) is the foundational paper; the technique has been refined for transformer-style models in the 2022–2026 literature. Adversarial debiasing handles continuous and high-dimensional protected attributes naturally and integrates cleanly with deep learning, but it is sensitive to hyperparameters and the adversary's capacity — a too-weak adversary leaves residual bias, a too-strong adversary destroys the predictor.

Fairness-aware regularisation

Simpler alternatives to constrained optimisation: add a regularisation term that penalises violation of the fairness criterion. This trades formal guarantees for ease of implementation and works in any modern training framework. The penalty term is typically a smoothed version of the discrete fairness criterion (e.g. a soft equalised-odds gap). Calders & Verwer (2010) is the early reference; modern versions are integrated in HuggingFace and PyTorch fairness libraries.

Multi-objective and Pareto-optimal training

For high-stakes deployments, the fairness-vs-accuracy trade-off is itself a deliverable: a Pareto frontier of models, each at a different trade-off point. Multi-objective optimisation techniques (NSGA-II, weighted-sum sweeps, evolutionary search) produce the frontier directly. The deployment decision is made by a stakeholder committee that picks a point on the frontier with a documented rationale. This is the dominant pattern in regulated deployments as of 2025–2026: the model card specifies the trade-off point and the alternatives that were available.

Post-processing Mitigations

Post-processing mitigations adjust the model's outputs without retraining. They are the cheapest to deploy (no retraining cost), most flexible (any base model), and most legally sensitive (they explicitly use group membership at inference time, which raises both helpful-discrimination and disparate-treatment questions).

Group-conditional thresholds

The simplest post-processing intervention: use different decision thresholds per group to achieve a target fairness criterion. Hardt et al. (2016) gave the foundational construction: given a calibrated base classifier, optimal equalised-odds enforcement requires picking, per group, a (possibly randomised) threshold such that true-positive and false-positive rates match across groups. The construction is provably optimal under the equalised-odds criterion. The legal sensitivity is real — explicit group-conditional thresholds are illegal in many jurisdictions for many use cases (US fair-lending law, e.g.) — and the operational pattern in those domains is to use post-processing only on group-blind score adjustments that achieve equivalent statistical effect.

Calibration repair within groups

If the deployed score is consumed downstream as a probability, group-wise calibration matters. Post-processing calibration (isotonic regression or Platt scaling fit per group on validation data) restores calibration without retraining. The technique is compatible with the impossibility result: you can have group-wise calibration or equalised odds, not both, when base rates differ — see §3.

Reject-option classification

Reject-option classification (Kamiran et al., 2012) identifies predictions in a confidence band around the decision threshold (where the model is uncertain), and flips them to favour the disadvantaged group. Outside that uncertainty band, predictions are unchanged. The technique implements equalised-odds-style fairness while limiting the accuracy cost to the uncertain region — a useful intuition for human-in-the-loop deployments where uncertain cases were going to be reviewed anyway.

Counterfactual post-processing for individuals

A 2024–2026 development: post-processing methods that operate at the individual level rather than group-aggregated. Given a per-prediction explanation (Ch 05), if a counterfactual exists that would change the prediction in a way that better matches an individual-fairness criterion, the system can either flag for human review or apply the counterfactual directly. This connects fairness mitigation to the explainability stack and is the leading edge for high-stakes individual decisions.

The legal-and-operational caveat

Post-processing is the technique most subject to legal scrutiny because it is the most visible: explicit group-aware decision rules. In US law, the line between disparate impact remediation (allowed) and disparate treatment (often not) is jurisprudentially complex, and the model owner has to navigate it carefully. The 2025–2026 EU AI Act guidance treats post-processing as a documented mitigation that requires explicit justification; the same is true under the most aggressive US state-level laws (Colorado, NYC). The operational pattern: use post-processing where legally sound, document the legal basis, and combine with pre- or in-processing methods that achieve similar effects without explicit group-conditional rules where required.

Documentation, Governance, and Accountability

Technical fairness mitigations live or die by their documentation. A model that satisfies a fairness criterion but is undocumented and uncontestable is not a fair deployment in any operational sense. The 2018–2026 literature has crystallised a small canon of documentation artefacts and governance practices that distinguish responsible programs from bolted-on compliance.

Model cards and datasheets

Mitchell et al.'s Model Cards for Model Reporting (2019) introduced the standard structure: intended use, performance disaggregated by relevant subgroups, training data summary, ethical considerations, caveats. Gebru et al.'s Datasheets for Datasets (2018) provided the analogue for training data: provenance, collection process, recommended uses, known limitations. The 2024–2026 evolution has been toward machine-readable variants (so audits can be automated against the documentation), continuous-update model cards (regenerated on every model version), and integration into the MLOps stack (Ch 04 of Part XVI on monitoring; Ch 07 on responsible release). A mature program treats the model card as a living artefact, not a one-time compliance document.

Algorithmic impact assessments

For high-stakes deployments — hiring, lending, criminal justice, healthcare, education, public-sector eligibility — many jurisdictions now require an algorithmic impact assessment (AIA) before deployment. The Canadian government's AIA template, NYC Local Law 144, the EU AI Act's high-risk system requirements, and the Colorado AI Act have produced converging structures: identify affected populations, document the design choices and trade-offs, run the disaggregated audit (§5), document the mitigation choices, plan ongoing monitoring, define a contestation pathway. The AIA is the master document into which model cards and datasheets feed.

Contestability and recourse

An adverse decision the affected individual cannot contest is a fairness failure regardless of the underlying metric. Contestability requires: a clear explanation of the decision (Ch 05), a documented appeal pathway, a mechanism for the affected individual to provide additional information, and a feedback loop that updates the model when contests succeed. Algorithmic recourse (Karimi et al., 2020) provides actionable counterfactuals: not just "the loan was denied because of debt-to-income" but "if you reduced your debt by $X, the prediction would flip" — with the path being feasible (no protected-attribute counterfactuals, no impossible feature changes). Recourse is the action-oriented complement to explanation.

Independent auditing and red-teaming

Internal audits should be supplemented by external review. The maturing audit-firm ecosystem (BABL AI, ORCAA, the major consultancies' AI-audit practices) provides the third-party perspective that internal teams cannot. Algorithmic red-teaming — staffed by domain experts adversarial to the model — surfaces failure modes that quantitative audits miss. The 2024–2026 industry convergence is around tiered audit requirements: internal disaggregated metrics for all models, third-party audit for high-stakes models, public reporting of audit findings for the highest-stakes systems.

The accountability chain

Technical mitigations need an organisational home. Mature programs assign explicit accountability: a model owner, a model-risk function, an executive sponsor, and a documented escalation path for fairness incidents. The accountability chain is the organisational analogue of the technical audit playbook — both are required for a deployment to be defensible.

The Frontier and the Open Problems

Fairness research has matured rapidly but the open problems are large, and the methodology evolves as deployments encounter new failure modes. This section surveys the leading edge as of 2026 and the questions practitioners should be following.

Fairness for LLMs and generative models

The classic group-fairness machinery was developed for binary classification with a single protected attribute. LLMs and generative models break the framework: the output is high-dimensional text or image, the "groups" are not always pre-defined, and the harms can be subtle (stereotype reinforcement, representational harms, asymmetric refusal patterns). The 2023–2026 work has produced specialised benchmarks (BBQ, BOLD, StereoSet, RealToxicityPrompts) and methods (constitutional AI from Ch 02, RLHF with fairness-conditioned reward models, debiased prompting). Bender et al.'s Stochastic Parrots (2021) and the line of work it provoked has shifted the conversation toward the upstream question: which training data was used, who is represented, and who is not?

Dynamic and long-term fairness

Most fairness analysis is static: one model, one decision point. Real deployments are dynamic: predictions affect outcomes affect future training data affect future predictions. Dynamic fairness studies these feedback loops formally. Liu et al. (2018) showed that static fairness criteria can produce worse long-term outcomes than no intervention in some plausible models. The frontier work (2024–2026) develops decision-theoretic frameworks for fairness across time, including reinforcement-learning formulations and multi-period game-theoretic models. For practitioners, the operational lesson is to monitor outcomes over time, not just predictions at deployment.

Multi-stakeholder and intersectional fairness

Real systems serve many stakeholders with different and partly-conflicting fairness preferences (lenders vs borrowers, platforms vs users, employers vs applicants). Multi-stakeholder fairness formalises the trade-off and connects to mechanism design. Intersectional fairness moves beyond single protected attributes to the cells defined by multiple attributes; the audit machinery exists (§5) but the formal guarantees are weaker because intersectional cells become small. Both are active research fronts and deployment challenges.

Privacy-preserving fairness

Computing disaggregated metrics requires the protected attribute. Privacy law restricts what attributes can be collected. Privacy-preserving fairness reconciles the two: secure aggregation, local differential privacy, and federated audit techniques let you compute the metrics without centralising the attributes. Ch 07 (Privacy in ML) develops the underlying privacy machinery; the fairness application is one of the cleanest case studies.

The limits of metrics

The most pointed open question is structural: are technical metrics the right primary discipline, or are they a sometimes-useful supplement to deeper governance work? Critics including Selbst et al. (2019), Hoffmann (2019), and the Critical Algorithm Studies tradition argue that the metrics literature can obscure structural failure by reducing it to a number. The mature programs combine technical metrics with stakeholder engagement, qualitative review, ethnographic study of impact, and explicit consideration of whether the deployment should happen at all. The frontier of fairness practice is the integration of technical and structural critique into a single accountable program — and the recognition that, sometimes, the right answer is not to deploy.