Financial ML & Quantitative Methods, predicting markets, modelling risk.

Finance was one of the first industries to deploy machine learning at scale — the quant funds of the 1980s were running statistical models on price data before "data science" had a name. Modern financial ML spans alpha research (predicting which assets will outperform), risk modelling (estimating the distribution of possible losses), high-frequency trading (microsecond-latency decisions in electronic markets), and the operational machinery of fraud detection, credit decisioning, and compliance. Building on the conceptual foundations of Ch 03 (markets, time value, risk and return, the EMH, financial statements, behavioural deviations), this chapter develops the methodology — the major model families, the engineering disciplines that distinguish serious quant work from data mining, and the deployment realities of working in markets where every published edge erodes the moment it becomes known.

Prerequisites & orientation

This chapter builds directly on Ch 03 (Intro to Finance & Economics) — the conceptual material there is the prerequisite for everything below, and the chapter cross-references back to specific Ch 03 sections rather than redeveloping the concepts. On the ML side, the chapter assumes basic probability and statistics (Part I Ch 04–05), supervised learning (Part IV Ch 01–02), time-series methods (Part XIII Ch 01) for the alpha-research and risk-modelling sections, and reinforcement learning (Part IX) for the execution and high-frequency material. The anomaly-detection chapter (Part XIII Ch 02) is the foundation for the fraud-detection section.

Two threads run through the chapter. The first is signal-to-noise asymmetry: financial markets are competitive in the sense Ch 03 Section 5 develops, and any genuinely predictive signal is rapidly arbitraged away. The methodology of the field is largely about finding signal in extremely noisy data, validating it without overfitting, and deploying it before the alpha decays. The second is regime change: financial data is not IID — the relationships between variables shift over time, and a model that worked yesterday may produce systematic losses tomorrow. The chapter is organised so the foundational alpha-research methodology comes first, then risk modelling, then the operational applications (execution, fraud, credit, compliance), with the regulatory and ethical considerations woven in throughout.

01

Why Financial ML Is Distinctive

Ch 03 explained why finance is its own discipline; this chapter is about why ML in finance is its own discipline. The conceptual properties Ch 03 introduced — markets aggregate information rapidly (Section 5), signals decay as they become known, distributions are not stationary, and behaviour deviates from the rational baseline (Section 9) — translate into specific methodological constraints on the models, training procedures, evaluation protocols, and deployment patterns that quant practitioners use. This section maps each property to its ML implication; the rest of the chapter develops the resulting methodology.

From non-stationarity to constant retraining

Financial time series do not have stable statistical properties. The volatility of equities in 2008 looked nothing like 2017; the correlations between asset classes in 2020 looked nothing like 2024; the relationship between interest rates and stock returns shifts as the monetary-policy regimes Ch 03 Section 8 introduced change. A model trained through 2019 and deployed in March 2020 lost money for months because the conditional distribution it had learned no longer applied. The technical name is distribution shift; the practical name is regime change; the ML methodological consequence is that production financial models must be continuously retrained, monitored, and human-supervised — the model is never "done." Standard responses include rolling-window training that adapts to recent conditions, explicit latent-regime models, and out-of-sample drift monitors that catch the moment a strategy breaks.

From efficient markets to weak signals

Ch 03 Section 5's EMH framing has a sharp ML implication. Because most public information is already priced in, the residual signal that remains for ML to predict is extremely small. Serious quant funds report Sharpe ratios on individual signals in the 0.5–1.5 range — typical R2 on daily returns is around 0.01, an order of magnitude below what's routine in image classification or language modelling. The methodology that follows from this is uncomfortable: financial ML is not about perfectly predicting anything, it is about extracting tiny consistent edges from extremely noisy data, and most of the chapter's machinery is about avoiding overfitting to noise that looks like signal.

From alpha decay to a continuous R&D pipeline

Once a profitable signal is deployed, others will eventually find it; once they trade on it, it goes away. Alpha decay is the structural reason financial ML can never reach the "trained-once, deployed-forever" pattern that suffices in most other ML domains — the 2026 quant industry has internal pipelines that produce dozens of new signals per year just to keep Sharpe constant. Beyond passive arbitrage, there are active adversaries: HFT firms extract value from slower participants (Section 6 develops this), fraud rings adapt to defenses (Section 8), and the methodology must handle both the passive-arbitrage problem and the active-adversarial one.

From regulation to constraint-aware models

Finance is one of the most heavily regulated industries on earth, in the institutional context Ch 03 Section 10 surveyed. Credit decisions must comply with fair-lending laws (no protected attributes, even indirectly, as Section 9 of this chapter develops). Trading systems must satisfy market-conduct rules. Risk models must satisfy regulatory capital frameworks (Basel III, Solvency II). Anti-money-laundering systems must produce auditable decision trails. The constraints shape the methodology — opaque deep learning is often disallowed for regulated decisions, and even where it is allowed it must be paired with explainability machinery that produces post-hoc rationales for each decision.

From loss aversion to risk-adjusted objectives

The loss function in finance is not symmetric — a 1% gain is worth less than a 1% loss is painful, the loss aversion Ch 03 Section 9 documented. A model that produces 100 small wins and one large loss may be worse than one with the same expected return but smaller variance. Production financial ML expresses this through risk-adjusted metrics (Sharpe, Sortino, drawdown-based), utility-theoretic objectives, and portfolio construction that explicitly bounds tail losses. Section 5 of this chapter develops this in detail; the conceptual takeaway is that "minimise expected loss" is the wrong default in finance, and methods need to be adjusted accordingly.

Why Financial ML Is Hard

If markets were predictable in any meaningful way, the predictable component would be arbitraged away — the EMH-style argument Ch 03 Section 5 develops. The methodology of financial ML is largely about finding signal in the residual after the obvious arbitrages are gone — extremely weak, extremely noisy, time-varying, adversarial, regulated. Every methodological tool in the chapter is, at its core, a tool for resisting overfitting in this hard regime.

02

Alpha Research and Factor Models

Before machine learning came to finance, the field had a well-developed theory of how to predict asset returns — the factor-model tradition that extends the CAPM intuition of Ch 03 Section 3. A factor model decomposes asset returns into exposures to a small set of priced risk factors plus an idiosyncratic residual; the residual after the standard factors are accounted for is called alpha, and almost every quantitative-investment strategy is some attempt to predict alpha. The factor framework is the substrate on which ML-based alpha research operates, and understanding it is the precondition for understanding what the methodology is buying you.

The multi-factor extension of CAPM

CAPM (Ch 03 Section 3) gave the single-factor case: expected excess return is proportional to market beta. The empirical inadequacy of single-factor CAPM motivated multi-factor extensions:

Linear factor model
ri,t = αi + Σk βi,k · Fk,t + εi,t
ri,t is asset i's return in period t; Fk,t is factor k's return; βi,k is asset i's loading on factor k; αi is the expected excess return after accounting for factors; εi,t is idiosyncratic noise. Factors typically include market (CAPM β), size (SMB), value (HML), momentum (MOM), profitability, investment, and increasingly sector, region, and style factors. The Fama-French three-factor and five-factor models are the canonical academic specifications; commercial systems like Barra and Axioma use 50–100 factors.

The factor zoo and multiple testing

Fama and French's 1992 paper introduced size and value as factors that survived the cross-section after controlling for market beta — Ch 03 Section 5 mentioned this in the EMH-anomalies context. The follow-on literature catalogued hundreds of supposed factors, most of which did not survive serious replication. The Harvey-Liu-Zhu and Hou-Xue-Zhang papers showed that the multiple-testing-corrected significance threshold for a "real" factor is much higher than naive p-values suggest — a t-statistic of 3 is closer to the right bar than the textbook 2. The lesson for ML-based alpha research is the same but louder: tens of thousands of features tested, with the winners selected for further study, is a recipe for false discoveries; Section 4's backtest-overfitting machinery is the answer.

Many of the strongest factor signals come directly from the financial-statement data Ch 03 Section 7 introduced: high-quality earnings (low accruals), high profitability (return on equity), and value (low price-to-book) are all computed from the income statement and balance sheet. Production factor pipelines spend substantial engineering on extracting clean factor signals from filings, then test them against the multiple-testing-corrected bar.

Cross-sectional versus time-series

Two distinct framings dominate factor research. Cross-sectional: at each time t, rank assets by some characteristic (book-to-market, momentum) and form a long-short portfolio of the top-vs-bottom quantiles. The portfolio's average return is the factor's compensation. Time-series: predict the absolute return of an asset using its own (or related) past values. Cross-sectional research is the dominant mode in the equity-quant world; time-series research is dominant in macro, FX, and rates. The methodology differs — cross-sectional ML focuses on ranking, time-series ML focuses on absolute prediction — but the underlying methodology principles are shared.

Risk factors versus alpha factors

A subtle but important distinction. Risk factors earn returns as compensation for bearing systematic risk — value stocks have higher returns because they're riskier. Alpha factors are predictive of future returns beyond what risk-based explanations can account for. The line is contested; what looks like alpha to one researcher looks like an unrecognised risk factor to another. The practical implication: alpha decays faster than risk premia. Discovering a "true" alpha is rare; the steady industry of factor research is largely about identifying which apparent edges are alpha (worth deploying) versus risk premia (worth understanding but already priced in).

Factor models as the alpha-research substrate

Modern alpha research operates on top of factor models. The standard pipeline: regress the strategy's returns on a set of known factors; the residual is the strategy's "pure" alpha; this residual is what gets evaluated. A strategy with a Sharpe of 1.5 and a factor-residual Sharpe of 0.3 is mostly a factor exposure with a small residual edge; a strategy with a Sharpe of 1.0 and a residual Sharpe of 0.9 is mostly genuine alpha. The latter is much more valuable, even though it has the lower headline Sharpe.

03

Machine Learning for Alpha

The 2010s saw a wave of machine-learning methods applied to alpha research, with mixed results. The headline finding: ML beats classical linear factor models on out-of-sample predictive performance, but the gap is smaller than ML's reputation in other domains would suggest, and the engineering work of avoiding overfitting is dominant. This section covers the methods that have actually delivered in practice and the methodological discipline that distinguishes serious financial ML from data-mining.

Why tree-based methods dominate

The most consistently successful family of methods for alpha prediction is gradient-boosted decision trees — XGBoost, LightGBM, CatBoost. They handle the heterogeneous feature sets typical of financial ML (some features are continuous, some categorical, some sparse), they are robust to feature scaling and missing data, they admit fast inference, and they generalise reasonably well from a moderate amount of data. The Gu-Kelly-Xiu paper (2020) systematically compared methods for return prediction and found that boosted trees were competitive with the best deep learning approaches at a fraction of the engineering cost.

Production quant pipelines typically use boosted trees as the workhorse for alpha modelling, with deep learning reserved for specific subdomains (alternative data, microstructure, derivatives) where the data has structure that benefits from neural-network architectures.

Neural-network factor models

Where deep learning helps most in alpha research is in modelling features whose relationship to returns is genuinely non-linear. Conditional autoencoders for factor extraction, LSTM-based time-series models for sequential dependencies in returns, and transformer-based models over multi-asset feature panels have all produced results in academic literature. The Gu-Kelly-Xiu line of work is representative — they demonstrate that deep models can beat classical methods on US equity returns, but the magnitude is modest (a Sharpe-ratio improvement of 0.2–0.5 in their tests).

The deeper problem with neural networks in alpha research is the same problem they have everywhere: overfitting to spurious patterns. Financial data has more spurious patterns per unit of data than almost any other domain because the noise is so large. A neural network trained without aggressive regularisation will find patterns that have no out-of-sample value. Successful production NN-based alpha models use heavy regularisation, ensembling, and aggressive cross-validation discipline (Section 4) to fight this.

Alternative data

The 2010s brought a wave of alternative data — non-traditional inputs to alpha research. Satellite imagery of parking lots and oil tanks; credit-card transaction panels; web-scraped product prices; sentiment from news and social media; supply-chain shipping data; corporate-jet flight data. Alternative data is expensive, often messy, and requires substantial engineering to integrate. The competitive edge it offers can be real but decays — once enough firms have access to a dataset, its predictive value drops.

The methodology for working with alternative data involves: aggressive cleaning and normalisation; feature engineering to translate raw signals into return-predictive features; careful evaluation that distinguishes the alternative-data signal from confounded standard signals; and explicit modelling of the data's coverage and reporting lag. Many published "alternative data" results have failed to replicate at scale because the signal extraction was not robust to the operational realities of production data.

LLM-based signals

The 2023–2026 wave of LLM applications in finance is producing a new class of signals. Earnings-call sentiment: an LLM reads the call transcript and produces sentiment, surprise, and forward-looking-statement features. News classification: LLMs categorise news events and estimate their financial significance. 10-K analysis: LLMs extract structured features (forward-looking risks, accounting changes, management discussion) from regulatory filings. Conference-call delta: comparing this quarter's call to last quarter's for unusual changes in language or tone.

The empirical evidence is encouraging — LLM-derived features have demonstrably positive incremental predictive power in several published studies — but the alpha appears to be decaying as more firms adopt the approach. The 2026 frontier is using LLMs not just to extract features but to perform multi-step reasoning over financial data: an LLM that reads the 10-K, the earnings call, the analyst notes, and produces a structured assessment of the company's prospects. Whether this scales to genuinely novel alpha generation or merely faster human-style analysis is an open question.

The uncomfortable truth about ML in alpha research

The honest 2026 perspective: ML has improved alpha research at the margin but has not transformed it. The biggest funds (Renaissance Technologies, Two Sigma, Citadel, DE Shaw) have used statistical learning methods for decades; the gains from modern deep learning over their earlier methods are real but incremental. The methodology that distinguishes successful quant from unsuccessful is less about the model class and more about the discipline of feature engineering, validation, and deployment. The chapter's next section on backtesting is the substance of that discipline.

04

Backtesting and the Overfitting Problem

If there is a single methodological topic that defines whether a quant practitioner is competent, it is backtesting. The naive approach — split the data, train on past, test on future — fails in several systematic ways that produce dramatically over-optimistic results. The literature on financial-ML evaluation is largely a literature of warnings and corrective protocols, and getting these right is the precondition for any defensible alpha claim.

The standard backtest pitfalls

Look-ahead bias: using information at time t that wasn't actually available at t. Examples: using a stock's full year of data including December when computing a January signal; using a company's restated earnings rather than as-reported; using survivorship-bias-corrected datasets that include companies that didn't exist yet. Each of these inflates measured performance dramatically.

Survivorship bias: backtesting only on companies that exist today rather than on the full historical universe including bankruptcies and delistings. A strategy that "works" on the surviving subset will systematically overestimate live performance because the failures have been removed.

Overfitting to the test set: even with a clean train-test split, repeated experimentation on the test set leaks information. After 20 strategies tested, one will look good by chance; after 1000, several will. The published Sharpe ratios in academic finance papers have been shown to be inflated by exactly this dynamic (the "p-hacking" problem in factor research).

Selection bias: papers that report findings systematically come from researchers who searched, found something positive, and wrote it up. The set of all findings — including the unreported negative results — has a much smaller effect size than the published distribution suggests. Lopez de Prado's "False Strategy Theorem" gives the formal version: with N strategies tested, the expected maximum Sharpe under the null is much higher than zero.

Walk-forward and time-aware cross-validation

WALK-FORWARD ANALYSIS WITH PURGE AND EMBARGO expanding train · purge · test · embargo · advance time → Fold 1 train purge test emb. Fold 2 train purge test emb. Fold 3 train purge test emb. Naive train (no purge — labels leak across boundary) test (biased) naive split overstates Sharpe — labels at the boundary leak forward
Walk-forward analysis with purge and embargo. Each fold expands the train window and tests on the next out-of-sample period. Purge removes training data immediately before the test set, because labels there often reflect outcomes that overlap with the test period (e.g. a 5-day return label uses data running into the test). Embargo waits a buffer after the test set before resuming training, so test-set noise does not leak into the next fold. The naive split (bottom) ignores both and systematically inflates measured performance — the single most common backtest error in junior quant work.

The right cross-validation methodology for financial data is walk-forward analysis: train on data through time t, test on the period from t to t+Δ, then advance the window and repeat. This respects the temporal structure (you never train on data after the test period) and produces multiple out-of-sample evaluation periods that can be averaged. Variants include expanding-window walk-forward (the train set grows over time) and rolling-window walk-forward (the train set has fixed length, so older data drops out as the window advances).

Within walk-forward, two refinements address subtle leakage. Purging removes training data near the test set boundary, because labels in financial data often reflect outcomes that overlap with the next training period (a 5-day return label at the end of training data uses information that runs into the test period). Embargoing waits a buffer period after the test set before resuming training, to prevent test-set noise from leaking into the next training window. Lopez de Prado's Advances in Financial Machine Learning (2018) develops these protocols in detail and is the standard reference.

The Deflated Sharpe Ratio

Bailey and Lopez de Prado introduced the Deflated Sharpe Ratio (DSR), which corrects an observed Sharpe for the multiple-testing problem. The key insight: if you tested N strategies and report the best, the maximum-of-N Sharpe under the null hypothesis (no real alpha) is not zero — it grows roughly with √(2 log N). Subtracting this expected maximum from the observed Sharpe produces a bias-corrected metric that's interpretable on a single-test scale. DSR has become standard practice in serious quant research and is the right methodology for any "we found a strategy" claim.

Transaction costs and capacity

An equally important consideration: a backtest that ignores transaction costs is a fiction. Real strategies pay bid-ask spreads, market impact, and fees, all of which scale with traded volume. A signal with a daily Sharpe of 2 might have a net Sharpe of 0.5 after costs, which is the only number that matters. Production backtests model transaction costs explicitly using historical-spread data and impact models calibrated to the strategy's expected size.

Related: capacity. A signal that works at $10M of AUM may not work at $1B because the strategy's own trades move prices against it. Capacity estimation is part of any serious backtest and constrains what strategies can be deployed. Many published academic strategies have plenty of "alpha" at small scale but no capacity at production size.

05

Portfolio Construction and Risk Modelling

Predicting returns is half the problem; translating predictions into a portfolio is the other half. The portfolio-construction step takes alpha forecasts plus risk and cost estimates and produces optimal asset weights. The methodology builds directly on the diversification and risk-return foundations of Ch 03 Section 3, and modern ML for portfolio construction is increasingly dominated by methods that handle the interactions between alpha, risk, and cost simultaneously rather than treating them as separate steps.

Mean-variance optimisation

Markowitz's mean-variance optimisation (1952) operationalises the diversification intuition Ch 03 Section 3 introduced. Given expected returns μ, a covariance matrix Σ, and a risk-aversion parameter γ, the optimal portfolio weights w solve:

Markowitz mean-variance objective
maxw μw − (γ/2) wΣw s.t. constraints
The first term rewards expected return; the second penalises portfolio variance. γ controls the trade-off — higher γ produces lower-risk portfolios. Constraints typically include weight bounds (no shorting, position limits), liquidity constraints, and turnover limits. The closed-form solution (without constraints) is w* ∝ Σ⁻¹μ; with constraints it requires quadratic programming.

Mean-variance is theoretically clean but practically fragile. Tiny changes in μ produce large changes in optimal weights — the optimisation is ill-conditioned because Σ is typically ill-conditioned (highly correlated assets). The estimates of μ and Σ come from data and are noisy, which means the "optimal" portfolio is dominated by estimation error rather than true optimality. Practitioners spend most of their effort on robustifying the inputs, not on solving the optimisation.

Black-Litterman and shrinkage

Two refinements substantially improve mean-variance in practice. Black-Litterman (1992) anchors the expected returns at an equilibrium implied by current market weights and lets the user impose views with explicit confidence levels. The resulting return forecasts are typically better-behaved than raw historical estimates, producing more stable optimal portfolios.

Covariance shrinkage (Ledoit-Wolf 2004) shrinks the sample covariance toward a structured target (constant correlation, factor model implied), reducing estimation noise. The shrinkage intensity is chosen analytically to minimise expected mean-square error. Shrinkage produces dramatically better out-of-sample portfolio performance than the raw sample covariance, especially when the number of assets approaches the number of observations.

Risk parity and risk-based allocation

An alternative philosophy abandons return prediction entirely and allocates based on risk alone. Risk parity (Bridgewater's "All Weather" framework) sets weights such that each asset (or asset class) contributes equally to total portfolio risk. The math: weight asset i by 1/σi (with multi-asset corrections via the covariance matrix). Risk parity portfolios tend to be heavily allocated to bonds (low volatility) and lightly to stocks (high volatility), with leverage applied to the whole portfolio to reach the target risk level.

The empirical case for risk parity is that estimating expected returns is hard and noisy; estimating risk is much more reliable; therefore optimising over risk produces more robust portfolios. The weakness is that risk parity is silent about which assets to overweight when one of them genuinely has higher expected return, and the leverage required to make it competitive is itself a source of risk.

Modern ML for portfolio construction

The 2020s have produced a wave of ML methods that do portfolio construction directly rather than as a post-hoc step. End-to-end portfolio learning (Uysal et al. 2024 and others) trains a neural network to map features to portfolio weights, with the loss being out-of-sample portfolio Sharpe. The advantage: the model learns to exploit interactions between alpha and risk that the two-stage μ-then-Σ pipeline cannot. The disadvantage: the model is harder to debug and the training signal is noisier.

Reinforcement learning for portfolio management treats the portfolio decision as a sequential MDP — state is the current portfolio plus market features, actions are weight changes, reward is risk-adjusted return. DRL portfolio managers (CPOR, EIIE, the various deep-RL-trading lines of work) have shown promise but have not yet displaced classical mean-variance plus shrinkage in production at top quant funds. The empirical pattern is similar to the alpha-research story: ML helps at the margin, but the methodological discipline matters more than the model class.

Risk modelling in detail

Beyond portfolio construction, financial firms run separate risk models for measurement, hedging, and regulatory purposes. Value at Risk (VaR) estimates the maximum loss at a confidence level; Expected Shortfall (ES, also called CVaR) estimates the expected loss conditional on being in the worst α% of outcomes. Both are computed from the portfolio's return distribution, typically estimated via parametric models (multivariate normal or t), historical simulation, or Monte Carlo over a factor-model simulation. The Basel III regulatory framework requires banks to compute and report ES; insurance regulation under Solvency II has parallel requirements.

06

High-Frequency Trading and Market Microstructure

At horizons of minutes to days, the methodology of the chapter is mostly about predicting future returns. At horizons of milliseconds to seconds, the methodology becomes something else entirely — a game of latency, market microstructure, and tactical execution. High-frequency trading is the umbrella for this regime, and it has its own distinctive engineering, statistical, and regulatory texture.

The order book as ML data

Ch 03 Section 4 introduced limit-order books as the mechanism by which prices are formed: a sorted list of bid and ask orders, with market orders walking the book to fill. For HFT, the order book is not just a price-formation mechanism — it is the input data. Its shape (volume at each price level), its dynamics (the rate at which orders arrive and cancel), and its imbalances (more volume on one side than the other) are the substrate that high-frequency-prediction models operate on.

Order-book features used in HFT models include: the bid-ask spread (Ch 03 Section 4), the imbalance between top-of-book bid and ask volume, the volume distributed across levels, the rate of order arrivals and cancellations, and the recent trade volume and direction. Feature engineering at sub-second horizons is a substantial discipline in itself, and the predictive power of these features for next-tick price changes is real but tiny — Sharpe ratios of 5–20 on individual signals at extremely small per-trade sizes, scaled by enormous trade volumes to produce profitable strategies.

Latency as alpha

At HF timescales, latency is alpha. Co-location at exchange data centres saves microseconds; FPGAs and custom ASIC trading systems extract microseconds more; microwave-tower links between Chicago and New York bypass fiber-optic latency by tens of microseconds. The major HFT firms (Jump, DRW, Jane Street, Citadel Securities, Virtu, IMC, Hudson River Trading, Optiver) compete on latency at this level, and the engineering investment is enormous. The methodology of HFT prediction is distinctive precisely because the constraints are so tight — model inference must complete in microseconds, which limits architectural choices.

Adverse selection and the market-maker problem

HFT firms divide roughly into two camps. Market makers post bid and ask orders simultaneously, earning the spread when both fill. Their problem is adverse selection — when the market is about to move, informed traders take the side that will profit, leaving the market maker holding the wrong-way inventory. The market maker's models are fundamentally about predicting which incoming orders are informed (and to be avoided) and which are uninformed (and to be filled).

The other camp is directional HFT firms that take outright positions based on short-term price predictions. These firms cross the spread to enter positions and unwind them quickly. Their problem is not adverse selection but signal decay — the predictive signal at HF horizons is short-lived and the firm must trade before others detect and arbitrage the signal.

ML in HFT

The role of ML in HFT differs from longer-horizon alpha research. Models must be small (fit in microseconds-of-inference budget) and stable (HFT systems run continuously without retraining). Tree-based methods and small neural networks dominate. Order-flow prediction models classify incoming orders by the kind of follow-up activity they predict. Price-impact models estimate how much the firm's own trades will move prices. Execution models decide how to break up large orders into smaller pieces that minimise impact.

The 2026 frontier in HFT is increasingly RL-based — using reinforcement learning to optimise execution strategies and market-making policies in continuous online environments. The challenges are severe (training in production is risky, simulators of market microstructure are imperfect) but the upside is real.

Regulation and the social question

HFT is controversial. Defenders argue that HFT firms provide liquidity, narrow spreads, and improve price discovery. Critics argue that HFT extracts rents from slower participants, increases systemic risk through episodes like the 2010 Flash Crash, and provides services that non-HFT participants would have received for free in a less-fragmented market. The regulatory framework (Reg NMS in the US, MiFID II in Europe) explicitly addresses HFT through tick-size rules, locked-market handling, and various market-structure protections. The methodology of the chapter is silent on the policy question but the practitioner should be aware that the field operates under substantial scrutiny.

07

Execution Algorithms and Market Impact

A trading desk that decides to buy 1% of a company's daily volume cannot just submit a market order — the resulting price impact would be enormous. Instead, the order is broken up over time and executed via an execution algorithm that balances market-impact cost (trading too fast moves prices) against opportunity cost (trading too slow lets the price drift away). Execution is the often-overlooked side of trading where the right methodology can save tens of basis points per trade — substantial money at scale.

The Almgren-Chriss framework

Robert Almgren and Neil Chriss's 2000 paper provided the canonical theoretical framework for optimal execution. The setup: minimise the expected cost of executing a given order under a risk-aversion penalty for the variance of execution cost. The cost has two components — temporary impact (the immediate cost of submitting an order) and permanent impact (the lasting price change). The variance comes from waiting — while the order is being executed, the price drifts. The trade-off produces an optimal execution trajectory.

Almgren-Chriss optimal execution (simplified)
minx(t) 𝔼[ cost ] + λ · Var[ cost ]
x(t) is the rate of executing the remaining order; the optimal trajectory under quadratic temporary-impact and constant volatility is exponentially-decaying execution rate. Higher λ (more risk-averse) produces faster execution; lower λ allows slower trading at the cost of higher variance. The framework is theoretically clean and serves as the baseline against which more sophisticated execution algorithms are measured.

VWAP, TWAP, and percentage-of-volume

The standard production execution algorithms approximate Almgren-Chriss with simpler heuristics that handle real-market complications. VWAP (Volume-Weighted Average Price) executes the order in proportion to the historical volume profile — more during high-volume periods, less during low-volume. The execution price tracks the day's VWAP, which is the standard benchmark for institutional execution. TWAP (Time-Weighted Average Price) executes at a constant rate across the trading day, simpler than VWAP but ignoring the volume pattern. Percentage-of-volume (POV) execution stays at a fixed fraction of market volume, which adapts in real time to changing market conditions.

Each algorithm has its place. VWAP is appropriate when matching the day's average price is the goal (typical institutional benchmarking). TWAP is appropriate when minimising the timing-of-order signal to the market matters (less predictable than VWAP). POV is appropriate when the order is large relative to daily volume and adaptive sizing matters.

Market-impact models

The empirical-finance literature has produced detailed models of market impact. The classical result (Almgren et al. 2005, Bouchaud et al. 2018) is that impact scales with the square root of order size — doubling the order quadruples-not-doubles the cost, but the effect is sublinear. This square-root law has been replicated across markets and asset classes and is the substrate of every serious execution algorithm. ML-based extensions refine the law with venue-specific, time-of-day-specific, and asset-specific parameters.

RL for execution

Execution is a natural RL problem — sequential decisions, observable state (the order book and remaining inventory), measurable reward (the executed price relative to a benchmark). The 2020s have produced several deep-RL approaches to optimal execution: PPO and DDPG agents that learn execution policies that beat Almgren-Chriss baselines on simulated and real data. The challenges are familiar: reward shaping (immediate reward vs. delayed reward from completed execution), simulator quality (training environments must match production microstructure), and robustness across regimes. Production deployments at top trading firms increasingly use RL-based execution, often combined with classical Almgren-Chriss as a fallback when the RL agent encounters unusual conditions.

The dark-pool question

A specific execution sub-problem: dark pools are venues where orders are hidden and matched anonymously. They reduce market-impact for large orders but introduce information-leakage risk (some dark pools have been accused of leaking flow information to favoured participants). Smart-order-routing systems decide for each piece of an order whether to send it to a lit exchange or a dark pool, optimising for execution cost and information control. The methodology is similar to multi-armed-bandit problems with each venue as an arm and the reward being execution quality.

08

Fraud Detection and Anti-Money-Laundering

Outside the trading floor, financial ML's largest production application is fraud detection — preventing payment-card fraud, account takeover, identity fraud, and money laundering. The methodology connects directly to anomaly detection (Part XIII Ch 02) and graph neural networks (Part XIII Ch 05), but the deployment context is distinctive: massive class imbalance, adversarial adaptation, regulatory constraints, and zero tolerance for false negatives on the highest-stakes cases.

Card fraud at scale

The largest single fraud-detection application is payment-card fraud — detecting fraudulent transactions in the few hundred milliseconds between authorisation request and approve/decline decision. The challenge is severe: tens of thousands of transactions per second, fraud rates of ~0.1% (extreme class imbalance), and a hard latency budget. The dominant production architecture is gradient-boosted trees (XGBoost / LightGBM) with hundreds of features per transaction — merchant category, geographic distance from prior transactions, hour of day, ratio of this transaction to the cardholder's typical spend, velocity features (transactions per hour, per day), authorisation channel.

The deployment pattern: train on labelled historical fraud (with the labels coming from chargebacks and customer-reported fraud); score every incoming transaction; reject above a threshold; route grey-zone transactions to step-up authentication (3-D Secure, push notifications). The thresholds are tuned to balance fraud capture (high recall) against customer friction (low false-positive rate), with the trade-off chosen by business judgment rather than technical optimum.

Class-imbalance methodology

The 0.1% positive rate creates technical complications. Standard cross-entropy loss is dominated by the negative class; the model can achieve 99.9% accuracy by predicting "not fraud" always. Solutions include: class-weighted loss (weight positives more heavily); oversampling the positive class via SMOTE or related methods; undersampling the negative class to a more balanced ratio; and focal loss (which down-weights well-classified examples). In production, calibrated probabilities matter — the threshold is chosen on a calibrated probability scale, so methods that distort probabilities (oversampling) require post-hoc recalibration. Modern card-fraud production pipelines typically use moderate undersampling for training efficiency plus probability calibration on a held-out balanced set.

Anti-money-laundering

Anti-money-laundering (AML) systems detect suspicious patterns of financial flows that indicate money laundering, terrorism financing, or sanctions evasion. The challenge differs from card fraud: the labels are extremely scarce (most suspicious-activity reports do not result in confirmed cases), the patterns are complex (laundering typically involves chains of accounts and entities), and the regulatory requirements are stringent (banks must investigate every flagged case and file Suspicious Activity Reports with regulators).

The traditional approach is rules-based — fixed rules trigger on patterns like "structured deposits below the reporting threshold" or "rapid transfers across multiple jurisdictions." Rules-based systems are interpretable and auditable but produce overwhelming false-positive rates (typical rates exceed 95%). The modern approach combines rules with ML — supervised models trained on cleared cases versus confirmed-positive cases, plus graph-based methods that detect networks of related accounts.

Graph methods for fraud and AML

Graph-based fraud detection is a particularly successful application of GNNs (Part XIII Ch 05). The idea: represent the financial network as a graph (nodes = accounts, edges = transactions, with attributes for amounts and timing), and use GNN-based methods to identify suspicious subgraph patterns. Money-laundering schemes typically involve specific topologies — circles, layered transfers, smurf networks — that graph methods can detect more naturally than per-transaction features. Production deployments at major banks combine traditional features with GNN-based subgraph features for AML scoring.

Adversarial adaptation

Fraudsters are not passive; they adapt to defenses. New attack vectors emerge quickly (AI-generated synthetic identities, deepfake voice for phone-based fraud, account-takeover via SIM-swapping). The fraud-detection methodology must continuously evolve, retrained frequently as new patterns emerge. Production teams maintain explicit feedback loops where investigated cases (confirmed fraud, cleared transactions) feed back into the next training cycle. The 2024 generation increasingly uses generative-AI-augmented synthetic-fraud patterns to stress-test detection systems before the real attackers arrive.

The regulatory backdrop

Fraud and AML systems operate under heavy regulation. AML systems are subject to the Bank Secrecy Act (US), the EU AML directives, and FATF recommendations globally. False negatives can produce massive fines; false positives create customer friction. The frameworks demand auditable decision logic, retention of investigation records, and regular regulatory examinations. The methodology of the chapter — interpretable models, explicit threshold tuning, careful handling of class imbalance — is shaped by these constraints as much as by the technical problem.

09

Credit Risk and Default Modelling

Credit-risk modelling — predicting which borrowers will default — is one of the oldest applications of statistical learning, dating back to the credit scorecards of the 1950s. The modern field combines classical scorecard methodology with modern ML, under heavy regulatory and explainability constraints. Few areas of ML have such a sharp gap between what is technically possible and what is allowed.

The PD/LGD/EAD decomposition

Credit-risk regulation (Basel III for banks) decomposes the expected loss on a loan into three components. Probability of Default (PD): the probability the borrower will default within a year. Loss Given Default (LGD): the fraction of the loan that will be lost if default occurs (after recoveries from collateral). Exposure at Default (EAD): the outstanding balance at the time of default. Expected Loss = PD · LGD · EAD. Each component is modelled separately, with PD getting the most attention because it varies most with borrower characteristics.

Scorecards and logistic regression

The traditional credit-scoring methodology is the scorecard — a logistic regression on a small number of carefully-binned features, producing a "score" that maps to default probability. Scorecards are simple, interpretable, easy to audit, and have been the production standard since the FICO score was introduced in 1989. They can be hand-implemented as a few-page table that scores each application, which is appealing for regulatory examination.

The classical Weight-of-Evidence transformation is the standard scorecard preprocessing — each continuous feature is binned, and each bin's contribution to the score is its log-odds-ratio for default. The result is a model that is both predictive and interpretable, with each feature's contribution to the final score explicitly visible. Modern scorecard tools (the Python optbinning library, SAS Credit Scoring) automate the binning and weight-of-evidence transformation while preserving the resulting model's interpretability.

ML for credit risk

Tree-based methods consistently outperform scorecards on out-of-sample default prediction by 2–5 AUC points — meaningful in a domain where every basis point of improvement scales to millions of dollars at portfolio scale. The 2018–2024 literature has documented these gains across consumer-credit, small-business-credit, and corporate-credit portfolios. Most commercial credit-scoring vendors now offer ML-based products alongside their classical scorecard offerings.

The deployment of ML for regulated credit decisions is constrained, however. Adverse-action notices (US Reg B, EU GDPR) require lenders to provide a meaningful explanation when a loan is denied. Black-box deep models cannot easily produce these explanations; SHAP and similar attribution methods can but require careful interpretation. Many lenders use ML for portfolio-level risk modelling and pricing while keeping the per-application decision system on a more interpretable scorecard plus explicit overrides.

Fair lending and bias

Credit-decision models are subject to fair lending constraints — they cannot use protected attributes (race, gender, religion) directly, and they must not produce disparate impact (substantially different rates of credit denial across protected groups even with neutral inputs). The methodology of fair-lending compliance includes: removing proxies for protected attributes (zip code is a strong race proxy in the US, for instance); testing for disparate impact statistically; and increasingly, incorporating fairness constraints directly into the training objective.

The 2024–2026 generation of fair-lending ML has produced explicit methods for training credit models with fairness constraints — adversarial debiasing, post-processing recalibration, fair-representation learning. None of these eliminate the underlying tension (predictive accuracy vs. fairness across groups), but they provide tools for navigating it. The regulatory framework is itself evolving — the CFPB and similar bodies have begun publishing guidance on ML-based credit decisions that should be tracked closely.

Survival analysis for credit

The PD framework predicts default within a fixed horizon (typically one year). For portfolio risk and pricing, the timing of default also matters — a loan that defaults in month 1 is much costlier than one that defaults in month 11. Survival analysis (Part XIII Ch 06) provides the natural framework: model the hazard rate of default as a function of borrower features and time, with credit-specific extensions for prepayment (which competes with default) and recovery dynamics. The Cox-regression and DeepSurv methods of Ch 06 transfer directly; the deployment is increasingly common at sophisticated lenders.

10

Applications and Frontier

Beyond the major applications covered in Sections 2–9, financial ML appears in dozens of specialised applications — derivatives pricing, ESG scoring, regulatory compliance, insurance underwriting, FX trading, crypto. This final section surveys the application landscape and the frontier where modern AI is reshaping the field.

Derivatives pricing and Greeks

Classical derivatives pricing (Black-Scholes for options, intensity models for credit derivatives) has analytical or near-analytical solutions for vanilla products. Exotic derivatives — barrier options, autocallables, structured products — require Monte Carlo simulation. ML enters through differential machine learning (Huge and Savine 2020), which replaces expensive Monte Carlo with neural networks that learn to approximate prices and Greeks directly. The methodology is mature enough that several major banks now use NN-based pricers in production for their most exotic books.

RL for trading

Reinforcement learning has long held promise for trading — the framework matches the problem structure (sequential decisions, observable state, monetary reward). The reality has been more mixed. The challenges are severe: noisy rewards, non-stationary environments, simulator-to-real gap, exploration without losing real money. The 2024 generation of RL trading has shown promise on specific subproblems (execution optimisation, market making, portfolio rebalancing) but has not yet displaced classical methods at the largest funds. The frontier is offline RL methods that learn from logged trading data without exploration, plus the various sim-to-real techniques developed for robotics adapted to trading environments.

LLMs in finance

The 2023–2026 wave of LLM applications in finance is rapidly expanding. Earnings-call analysis, 10-K parsing, news classification are now routine. Research-assistant LLMs for sell-side analysts produce drafts, summaries, and comparative analyses. Compliance LLMs review communications for regulatory red flags. Customer-service LLMs handle most retail-banking interactions. Bloomberg's BloombergGPT (2023), the various FinanceGPT efforts, and the in-house finance-tuned LLMs at major banks are all at production scale. The transformation is substantial but mostly augmenting rather than replacing existing workflows.

Crypto and decentralised finance

The crypto and DeFi ecosystem has its own ML methodology. On-chain analysis uses graph methods over the public blockchain to identify wallet behaviour, AML flags, and trading patterns. DEX market making on Uniswap-style automated market makers requires its own methodology distinct from order-book HFT. Smart contract risk modelling tries to assess the security of Solidity code. The field is technically interesting but the regulatory environment is unsettled, and serious quant deployment in crypto has been more cautious than the headlines suggest.

ESG and climate-risk modelling

ESG (Environmental, Social, Governance) scoring is increasingly integrated into investment decisions, and ML methods are central to producing the underlying scores. Climate-risk modelling estimates the financial impact of physical-risk (coastal flooding, agricultural disruption) and transition-risk (stranded carbon assets) scenarios. The data is sparse, the models are necessarily speculative, and the regulatory framework (SFDR in Europe, SEC climate disclosure rules in the US) is still solidifying. ML's role is partly the same as in any new domain — extracting structured features from unstructured disclosures — and partly distinctive — building scenario simulators that combine climate models with financial impact models.

Frontier methods

Several frontiers are particularly active in 2026. Causal inference for finance: the doubly-robust and orthogonal-ML estimators of Part XIII Ch 04 are increasingly applied to estimating the causal impact of corporate decisions, policy changes, and trading strategies. Foundation models for time series: pretrained models like TimeGPT, Chronos, and the various 2024 financial-time-series foundation models offer general-purpose forecasting that competes with bespoke models. Federated learning for finance: cross-institutional fraud detection, AML, and credit-risk modelling without sharing customer data. Quantum ML for portfolio optimisation: the optimisation problems of portfolio construction are the natural target of quantum-annealing and quantum-classical hybrid methods, though the practical impact remains modest.

What this chapter does not cover

Several adjacent areas are out of scope. The conceptual finance and economics foundations are covered in Ch 03 (markets, time value, risk-return, EMH, asset classes, financial statements, behavioural deviations, intermediation) and assumed here rather than redeveloped. Beyond Ch 03's coverage, several specialised methodologies are also out of scope. Stochastic calculus, partial-differential-equation solvers for derivatives, and classical option-pricing theory (Black-Scholes, term-structure models) are the subject of substantial quantitative-finance textbooks rather than ML chapters. Insurance modelling and actuarial science share methodological territory with credit and risk modelling but have their own traditions. The deeper behavioural-finance literature beyond Ch 03 Section 9 — market-psychology research, trader behaviour, the various crisis-specific behavioural studies — is outside the chapter's technical scope. And the substantial regulatory literature on ML in finance — fair-lending, model-risk-management (SR 11-7 in the US), explainability requirements under GDPR and the EU AI Act — is essential operational knowledge but conventionally treated through the legal/compliance lens rather than the ML lens.

Further reading

Foundational papers and textbooks for financial ML and quantitative methods. The Lopez de Prado book plus the Gu-Kelly-Xiu paper plus the Almgren-Chriss execution paper plus a fair-lending reference is the right starting kit for practitioners.