Part XV · AI for Science · Chapter 01

Scientific Machine Learning, where data and physical law meet.

Science has always been a partnership between observation and theory. For four hundred years the partnership ran predominantly through differential equations: Newton's mechanics, Maxwell's electrodynamics, Einstein's general relativity, and the partial-differential-equation models that describe fluid flow, climate, materials, and biology. Computational science of the late twentieth century built on this foundation, with finite elements, finite volumes, spectral methods, and Monte Carlo simulation collectively producing most of what we know about complex physical systems. Scientific machine learning (SciML) is the discipline that fuses this physics-based machinery with modern data-driven methods: surrogate models that replace expensive simulators with fast neural approximations, physics-informed networks that bake conservation laws into ML losses, neural operators that learn solution maps for entire families of PDEs, differentiable simulators that combine gradient-based optimisation with mechanistic models, and symbolic-regression systems that recover governing equations from data. This chapter develops the methods, the engineering disciplines they require, and the deployment realities of doing computational science with the modern ML toolkit. It is the methodological foundation on top of which the rest of Part XV's domain-specific chapters rest.

Prerequisites & orientation

This chapter assumes the working machinery of modern deep learning (Part VI), the optimisation methods of Part I Ch 03, and basic numerical analysis (Part I Ch 02 on calculus and PDEs at the level of an undergraduate engineering course). The Bayesian deep-learning material of Part XIII Ch 07 is useful for the uncertainty quantification of Section 7. No background in computational science is assumed; the chapter introduces the relevant concepts (finite elements, spectral methods, time integration, mesh-based methods) as they arise. The applied chapters that follow this one — Ch 04 AI for Protein Science, Ch 06 AI for Biology, Ch 08 AI for Drug Discovery, Ch 10 AI for Climate, Ch 12 AI for Physics, Ch 14 AI for Materials, Ch 16 AI for Astronomy — all draw on this chapter's methodology.

Two threads run through the chapter. The first is the physics-data trade-off: pure physics models are interpretable and generalise well but are expensive and require known equations; pure data-driven models are fast and flexible but extrapolate badly and can violate conservation laws. SciML methods sit on a spectrum between these poles, with each method making different choices about how much physics to bake in. The second thread is generalisation: scientific applications routinely require predictions in regimes (parameters, geometries, boundary conditions) outside the training distribution, and the field has developed substantial machinery — symmetries, equivariance, neural operators, hybrid models — for making this work. Both threads recur in every section that follows.

In this chapter

Why Scientific ML Is Distinctive PDEs · physical priors · extrapolation · interpretability
Data-Driven Discovery of Equations SINDy · symbolic regression · sparse identification · DSR
Surrogate Models and Emulators GPs · neural surrogates · Gaussian processes · simulator replacement
Physics-Informed Neural Networks PINNs · PDE residual loss · variational PINNs · hard constraints
Neural Operators DeepONet · FNO · GNO · solution maps · resolution invariance
Differentiable Simulation JAX · autodiff · gradient-based design · DiffTaichi · MPM
Hybrid Physics-ML Models closure models · subgrid · residual nets · grey-box
Symmetries, Equivariance, and Conservation group equivariance · Hamiltonian NNs · gauge invariance
Uncertainty, Validation, and Benchmarks UQ · epistemic vs aleatoric · benchmarks · reliability
Software, Workflows, and the Frontier JAX · PyTorch · Modulus · ecosystem · foundation models

Why Scientific ML Is Distinctive

Most ML practitioners learn the discipline on tasks where the loss is well-defined, the data is plentiful, and the only physical constraint is whatever was implicitly captured by the training distribution. Scientific applications break each of these assumptions: the underlying problem is often a partial differential equation that has stood for two centuries, the data is expensive to generate, and predictions outside the training distribution have to obey conservation of energy, mass, momentum, and charge. The methodology of scientific ML is the response to these constraints.

The PDE substrate

The objects of study in computational science are typically partial differential equations — equations that relate a quantity (temperature, pressure, electromagnetic field, wavefunction) to its derivatives in space and time. The Navier-Stokes equations describe fluid flow; Maxwell's equations describe electromagnetic phenomena; the Schrödinger equation describes quantum systems; the heat equation describes diffusion. Most of computational physics, fluid dynamics, structural analysis, climate modelling, and quantum chemistry reduces to numerical solution of PDEs over complex geometries with appropriate boundary conditions. Scientific ML methods are best understood as either replacing these solvers (surrogates), augmenting them (hybrid models), or discovering the equations themselves from data.

Physical priors are non-negotiable

A classifier that occasionally violates a probability axiom is a nuisance; a fluid simulator that occasionally violates conservation of mass is broken. Physical laws — conservation of energy, mass, momentum, charge; the second law of thermodynamics; the principle of least action; gauge invariance; relativistic causality — are not soft preferences. Models that violate them produce predictions that are not just inaccurate but physically impossible (negative pressures, faster-than-light propagation, mass appearing from nothing). The methodology of scientific ML pays substantial attention to building physical priors directly into model architectures, training losses, and inference procedures, with Section 8's symmetries-and-equivariance material developing the most-studied tools.

Data is expensive

In most ML domains, more data is the default solution to problems. In scientific ML, more data may require running a $10-million supercomputer simulation for a week, or assembling a $100-million experimental facility for a measurement campaign, or waiting for a once-per-decade satellite mission. The methodology of the field accommodates this with data-efficient methods that build in physical knowledge to compensate for limited data, transfer learning across related problems, active learning that targets the most informative simulations or experiments, and increasingly foundation models trained on heterogeneous scientific data that fine-tune to specific applications.

Extrapolation is the goal

In most ML tasks, the implicit goal is interpolation within the training distribution. In scientific ML, the goal is routinely extrapolation: train the model on simulations at low Reynolds numbers, deploy it at high; train on small molecules, deploy on large; train on past climate, deploy on future. Standard ML practice (split data, validate on a held-out portion, deploy and monitor) is insufficient, because the deployment distribution differs from training in ways that the held-out validation set cannot capture. The methodology requires explicit attention to physical reasoning: are the symmetries and conservation laws preserved? Does the surrogate behave correctly in known asymptotic limits? Do the equations the model has implicitly learned reduce to the right behaviour at boundary cases?

Interpretability and trust

Scientific ML predictions feed into engineering design (will this bridge stand?), policy (what is the climate-sensitivity range we should plan for?), and discovery (what new physics is implied by this measurement?). The standards for trustworthiness are correspondingly high. The methodology of the field emphasises explicit uncertainty quantification (Section 9), validation against analytical solutions and known benchmarks, and increasingly the use of interpretable methods (symbolic regression, sparse identification) that produce human-readable equations rather than opaque neural networks.

The Two Worlds Meet

Computational science has been an established discipline for sixty years, with mature numerical methods (finite elements, spectral methods, Monte Carlo) that produce reliable predictions when their assumptions hold. Modern ML brings flexibility and speed but risks producing physically impossible answers. Scientific ML is the methodology of getting both — the speed of ML, the reliability of physics — without trading off either too much. The rest of the chapter develops the major techniques that achieve this balance.

The spectrum of scientific ML methods. Each method makes different choices about how much physics to bake into the architecture vs. learn from data; the rest of the chapter develops each in turn.

Data-Driven Discovery of Equations

The deepest application of ML to science is the discovery of governing equations directly from data. Rather than fitting a black-box model that predicts y from x, equation-discovery methods produce a human-readable expression — a differential equation, an algebraic identity, a Hamiltonian — that captures the underlying mechanism. The methodology has substantial pre-ML history (the work of Schmidt and Lipson on symbolic regression, the genetic-programming literature) that modern approaches build on.

Sparse identification (SINDy)

The single most-influential modern method is sparse identification of nonlinear dynamics (SINDy, Brunton, Proctor & Kutz 2016). The idea is to assume the dynamics live in a sparse combination of candidate basis functions (polynomials, trigonometric functions, derivatives), construct a large library of these candidates, and use sparse regression (LASSO, sequential thresholded least squares) to identify which combination best fits observed time-derivative data. The approach has been applied to fluid dynamics (recovering the Lorenz equations from time-series data), reaction-network identification, and increasingly biological-systems modelling. Subsequent extensions handle stochastic dynamics (SINDy-SA), control inputs (SINDYc), and PDEs (PDE-FIND).

Symbolic regression

More general than SINDy, symbolic regression searches the space of mathematical expressions directly for the formula best fitting given data. Classical approaches use genetic programming (Eureqa, the original Schmidt-Lipson system from 2009); modern approaches combine neural networks with symbolic search (DSR — Deep Symbolic Regression, AI Feynman, the various 2020–2024 successors). The methodology is computationally expensive (the search space is combinatorially large) but produces uniquely interpretable results: a closed-form equation that scientists can analyse, integrate, and connect to existing theory. The 2020 AI Feynman paper rediscovered Feynman's textbook physics equations from synthetic data; the 2024 wave of FunSearch-style methods extends this to genuine discovery.

Hidden variables and causal structure

Equation discovery from observation alone faces a fundamental obstacle: many systems have hidden variables not directly measured. The classical example is the harmonic oscillator — observing position alone, the dynamics look stochastic; observing position and velocity, they reveal Hamilton's equations. Modern methods (the various neural-state-space models, latent-ODE approaches, the SINDy-AE variants) use autoencoders or related representations to discover hidden variables jointly with the dynamics, with substantial empirical success on systems where the underlying state space is moderate-dimensional.

The data-from-simulation pattern

Where experimental data is scarce, simulation-generated data is the workaround. A high-fidelity simulator (CFD code, molecular dynamics package, climate model) generates training data; ML methods discover surrogates or equations that capture the simulator's behaviour. The methodology has the same flavour as student-teacher distillation in mainstream ML — the simulator is the teacher, the discovered model is the student — and the practical wins are similar: a discovered surrogate can run thousands of times faster than the original simulator while preserving accuracy in trained regimes.

The validation problem

Discovered equations require validation that goes beyond predictive accuracy on held-out data. Standard checks include: dimensional analysis (do the units match?), asymptotic-limit behaviour (does the equation reduce correctly in known limits?), conservation properties (do invariants of the underlying system remain invariant in the discovered dynamics?), and connection to existing theory (does the form of the equation match what physical reasoning would predict?). Production deployments of equation-discovery methods invest substantially in this validation layer, and the failure mode (a numerically-accurate but physically-meaningless equation) is well-documented in the literature.

Surrogate Models and Emulators

The most-deployed application of ML in science is surrogate modelling — replacing an expensive simulator or experiment with a fast, learned approximation. The methodology has decades of pre-deep-learning history (Gaussian processes from the 1960s onward, polynomial chaos expansions, response-surface methods), and modern neural-network approaches sit on top of this foundation rather than replacing it.

Gaussian processes

The classical surrogate-modelling tool is the Gaussian process (GP). A GP places a prior over functions (parameterised by a kernel function), conditions on observed data, and produces a posterior predictive distribution at any new query point. The advantages are substantial: principled uncertainty quantification, strong performance in low-data regimes (tens to thousands of training points), natural handling of measurement noise, and a rich theoretical foundation. The disadvantages are equally important: standard GPs scale O(N³) in training set size, kernel design requires expert input, and the method does not extend naturally to very high-dimensional inputs. GPs remain dominant for low-data, high-stakes scientific surrogate problems (engineering design, expensive-experiment Bayesian optimisation), and the GPyTorch and Stan ecosystems have substantially improved their accessibility.

Neural network surrogates

For higher-data, higher-dimensional problems, neural surrogates have become the workhorse. A standard feed-forward network maps simulator inputs (geometry, boundary conditions, parameters) to outputs (predicted fields, summary statistics, time series). Training data comes from a sweep of simulator runs; the network learns to interpolate. The architectures range from simple multilayer perceptrons through CNNs (when the input is spatial), graph neural networks (when the input has irregular connectivity), and transformers (for sequence-like scientific data). The 2020s wave of foundation models — pretrained on diverse scientific data, fine-tuned to specific surrogate problems — extends this further.

Active learning and optimal experimental design

Surrogate modelling has a natural feedback loop: the model is uncertain in some regions of input space, so the next simulation should target those regions, which reduces uncertainty further, which informs the next simulation. The methodology of active learning formalises this loop. Bayesian optimisation (using GP surrogates) and the various information-theoretic acquisition functions (expected improvement, probability of improvement, entropy search) are the standard tools. Production deployments — engineering design optimisation, scientific-experiment planning, drug-discovery hit identification — routinely use active learning to substantially reduce the number of expensive simulations or experiments required.

Multi-fidelity methods

Many scientific problems have access to multiple fidelity levels of simulator: a fast-but-inaccurate version (1D model, coarse grid, simplified physics) and a slow-but-accurate version (3D, fine grid, full physics). Multi-fidelity methods learn the relationship between fidelities and use cheap-fidelity data to inform expensive-fidelity predictions. The classical tool is co-kriging (a multi-output GP that captures the cross-fidelity correlation); modern variants use neural networks (multi-fidelity DeepONets, the various 2020s deep multi-fidelity methods). The empirical wins can be dramatic — orders of magnitude reduction in expensive-simulator calls — when the relationship between fidelities is smooth.

The validation discipline

Surrogate models inherit the validation problem of equation discovery: predictive accuracy on held-out simulation data is necessary but not sufficient. The surrogate may be accurate in trained regimes but extrapolate badly outside, may match summary statistics but get spatial structure wrong, may be calibrated on average but miss tail behaviour. The methodology of careful surrogate validation — testing in known asymptotic limits, comparing against analytical solutions where they exist, checking conservation properties, validating against independent experimental data — distinguishes successful production deployments from research demonstrations that fail under deployment.

Physics-Informed Neural Networks

The most-studied scientific-ML method since 2017 is the physics-informed neural network (PINN). The idea is simple: train a neural network to represent the solution of a PDE by penalising violation of the PDE itself in the loss function, alongside boundary and initial conditions. The 2019 Raissi-Perdikaris-Karniadakis paper popularised the approach, and the literature has exploded since.

The basic PINN

For a PDE F(u, ∂u/∂t, ∂u/∂x, …) = 0 with boundary conditions B(u) = 0, the PINN trains a neural network u_θ(x, t) to minimise the sum of: (1) data loss on observed measurements, (2) PDE residual loss evaluated at sampled collocation points, and (3) boundary-condition loss. Automatic differentiation (the same machinery that makes deep learning work) computes the derivatives needed for the residual term. The result is a mesh-free PDE solver that can handle inverse problems (where coefficients are unknown), data-assimilation (where measurements at sparse points constrain the solution), and high-dimensional problems (where classical mesh-based methods become intractable).

Where PINNs work and where they don't

PINNs have empirically succeeded on smooth problems with moderate Reynolds numbers, well-behaved boundary conditions, and modest geometric complexity. They have struggled on stiff problems (where multiple time scales coexist), turbulent flows (where small-scale structure is essential), and problems with sharp discontinuities (shocks, contact surfaces, phase boundaries). The 2022–2024 literature documents many failure modes — vanishing gradient pathologies, spectral bias toward low-frequency content, optimisation instabilities — and the methodology has matured substantially in response.

Improvements and variants

Several extensions address the basic PINN's limitations. Variational PINNs (vPINNs) use the variational form of the PDE rather than the strong form, with theoretical advantages for elliptic problems. Hard-constraint PINNs bake boundary conditions directly into the network architecture rather than enforcing them via loss, eliminating one source of optimisation difficulty. Causal PINNs respect the causal structure of time-evolution problems, training the early-time solution before the later-time solution. Domain-decomposition methods (XPINNs, cPINNs) split the spatial domain into subdomains with separate networks, improving scalability. The 2023–2026 literature continues to extend the toolkit, and the empirical state-of-the-art for PINN-tractable problems has improved substantially.

Inverse problems and data assimilation

The killer application of PINNs is inverse problems: given measurements of a system, infer the underlying parameters, source terms, or boundary conditions of the governing PDE. Classical methods (adjoint-based optimisation, ensemble Kalman filters) work but are computationally demanding; PINNs handle inverse problems essentially the same way they handle forward problems — by adding the unknown parameters as trainable variables alongside the network weights. Production applications include subsurface imaging (inferring rock properties from seismic data), medical imaging (inferring tissue properties from MRI data), and climate data assimilation.

The methodological role

Despite the substantial literature, PINNs are not a replacement for classical numerical methods on problems the classical methods solve well. They are a complement — useful for inverse problems, for problems where classical mesh generation is hard, for high-dimensional problems where mesh-based methods are intractable, and for hybrid use cases where the network represents an unknown subgrid model alongside a classical solver. Section 7 develops the hybrid pattern; the conceptual point is that PINNs sit alongside, not above, the classical computational-science toolkit.

Neural Operators

PINNs learn the solution to one specific instance of a PDE — fixed boundary conditions, fixed parameters, fixed forcing. Neural operators learn the entire solution map: given any boundary condition or parameter, produce the corresponding solution. The methodology is substantially more ambitious and substantially more powerful, and it has been the most-active scientific-ML research direction since 2020.

The DeepONet construction

The first widely-deployed neural operator was DeepONet (Lu, Jin & Karniadakis 2019). It uses a "branch-and-trunk" architecture: a branch network encodes the input function (boundary conditions, initial state, forcing term) at a fixed set of sensor locations; a trunk network produces a basis at any query point; the output is the inner product of branch and trunk activations. The construction is theoretically justified by a universal approximation theorem for operators (Chen & Chen 1995, extended by the DeepONet authors), which guarantees that the architecture can approximate any continuous operator with sufficient capacity.

Fourier neural operators

The methodologically most-influential alternative is the Fourier neural operator (FNO, Li et al. 2020). FNOs apply learned linear operators in Fourier space — the input is FFT'd, multiplied by learned weights at each frequency, and inverse-FFT'd — interleaved with pointwise nonlinearities. The architecture has the remarkable property of discretization invariance: an FNO trained on one resolution can evaluate at any other resolution (subject to numerical limits). Empirical performance on canonical PDE benchmarks (Burgers, Darcy flow, Navier-Stokes) has been state-of-the-art, and the architecture has been extended in many directions (Spherical FNOs for climate applications, Geo-FNOs for irregular geometries, the various 2024 successors).

Graph neural operators

For problems on irregular geometries — typical of engineering applications and biology — graph neural operators (GNOs) and the related message-passing neural operators are the natural fit. The methodology connects directly to the GNN material of Part XIII Ch 05, applied to the specific problem of learning PDE solution maps. Production deployments at major engineering software vendors (Ansys, Siemens, COMSOL) increasingly include neural-operator surrogate-modelling capabilities, and the methodology has begun to displace classical reduced-order models for many applications.

Transformer-based operators

The 2023–2026 wave of neural-operator research increasingly uses transformer architectures, drawing on the same scaling intuitions that drove the language-model wave. Models like the OFormer, GNOT, and the various 2024 transformer-operator variants combine attention-based long-range coupling with operator-learning structure, with empirical wins on problems where long-range interactions matter (turbulence, electromagnetic propagation, gravitational dynamics). The frontier is rapidly evolving, and the architectural picture in 2028 will likely look different from 2026.

The training-data problem

Neural operators require training data — pairs of (input function, solution) generated by running the underlying PDE solver. The methodology is therefore not a complete replacement for the solver: it requires substantial classical-simulation effort to generate training data, after which the neural operator is fast at inference but bound to the parameter regime of its training data. The deployment pattern is best understood as amortising the cost of many solver runs: spend the simulation effort once to train the operator, then evaluate it cheaply across many design iterations or scenarios. For engineering optimisation, design exploration, and uncertainty quantification — where many similar PDE solves are needed — the amortisation pays off; for one-off problems, it does not.

Differentiable Simulation

A different approach to combining ML and physics is differentiable simulation: write the simulator itself in an autodiff framework (JAX, PyTorch, custom DSLs), which makes the whole simulator a differentiable function. The result is gradient-based optimisation through the simulator — for design, control, parameter estimation, and ML training combined with classical mechanics.

The autodiff revolution in scientific computing

The deep-learning era's killer infrastructure was reverse-mode automatic differentiation, embedded in frameworks like PyTorch and JAX. The 2020–2026 wave has applied the same infrastructure to scientific simulation: classical numerical methods (finite elements, finite volumes, spectral methods, particle simulators) reimplemented in autodiff frameworks become differentiable end-to-end. Gradients of any output quantity with respect to any input parameter become available, and gradient-based optimisation methods can drive design and control problems that previously required expensive black-box optimisation.

Differentiable physics frameworks

Several frameworks have established the methodology. JAX (Bradbury et al. 2018) provides composable transforms (grad, jit, vmap, pmap) over array computations and has become the dominant SciML platform. DiffTaichi (Hu et al. 2020) provides differentiable physics primitives for graphics and robotics applications. Brax, MuJoCo MJX, and the various differentiable rigid-body simulators provide GPU-accelerated differentiable simulation for robotics. PhiFlow targets fluid dynamics; Modulus (NVIDIA) provides a unified PINN-and-FNO platform; Firedrake and Dolfin-Adjoint target finite-element analysis with adjoint-based gradients.

Inverse design

The killer application is inverse design: specify the desired output behaviour and use gradient descent to find the design parameters that produce it. Photonic design (engineering metamaterials with prescribed optical responses), materials design (engineering composites with prescribed mechanical properties), aerodynamic shape optimisation, and increasingly molecular and protein design all use differentiable simulation. The methodology connects naturally to the generative-design tradition (Autodesk Generative Design and similar tools) but with substantially better scaling because gradients replace black-box search.

End-to-end learning with classical-physics priors

Differentiable simulation enables a novel ML pattern: train a neural network jointly with a classical simulator, with gradients flowing through both. The network might represent an unknown closure model, a learned material law, or a learned subgrid parameterisation; the simulator handles the well-understood physics; both are optimised together against observed data. The methodology has produced empirical wins in fluid dynamics (learned subgrid closures for LES), climate modelling (ML-augmented atmospheric models), and material modelling (learned constitutive relations), and is the technical core of the hybrid models that Section 7 develops.

The implementation reality

Differentiable simulation looks straightforward in theory and is hard in practice. The simulator code has to be rewritten in the autodiff framework, which can be substantial engineering effort for legacy codebases. Numerical stability issues that classical solvers handle implicitly become explicit problems for autodiff (gradients through numerical schemes can blow up where the function values do not). Memory consumption for reverse-mode autodiff scales with simulation length, which limits long-time simulation. The 2024–2026 wave of "differentiable scientific computing" infrastructure (the JAX ecosystem, the PyTorch scientific-computing extensions) has substantially matured the engineering, but production deployments still require expert care.

Hybrid Physics-ML Models

The most-pragmatic methodology in scientific ML is the hybrid model — combine classical physics (for the parts of the system that are well-understood) with ML (for the parts that are not). The methodology has many names — grey-box modelling, residual learning, closure modelling, augmented dynamics — but the underlying idea is consistent: do not throw away decades of physical understanding when adopting modern ML.

The closure-modelling tradition

Computational science has long faced the closure problem: equations describing a system involve quantities that the simulator cannot compute directly. Turbulence modelling is the canonical example: the Reynolds-averaged Navier-Stokes (RANS) equations involve a "Reynolds stress" term that requires a separate model. For seventy years, closure models have been hand-crafted (Smagorinsky, k-epsilon, k-omega) with substantial empirical tuning. ML offers an alternative: learn the closure from high-fidelity simulation data (DNS — direct numerical simulation) and embed it in the lower-fidelity solver. The methodology has produced measurable improvements in turbulence modelling, climate-model subgrid parameterisations, and chemical-kinetics closures.

Residual learning

A specific hybrid pattern is residual learning: the classical physics model produces a baseline prediction, and a neural network learns the residual between the physics prediction and the observed data. The methodology has the desirable property that when the physics model is correct (or nearly so), the network learns to output near-zero residuals; the model degrades gracefully to the classical baseline. The pattern is used extensively in weather forecasting (the GraphCast and Pangu-Weather models can be understood partially in this way, with the physics being the standard atmospheric tendency), in pharmaceuticals (where the physics is a quantum mechanics model and the network learns corrections), and in materials science (where the physics is DFT and the network learns higher-order corrections).

Universal differential equations

A specific framing worth flagging is the universal differential equation (UDE) of Rackauckas et al. 2020. The idea is to write the system as an ODE or PDE where some terms are physics (known) and some terms are universal approximators (learned). The composite is differentiable end-to-end (via differentiable simulation, Section 6), and gradient-based training on observed data produces optimal weights for the learned terms. The methodology has been applied to everything from disease modelling through climate to materials, and the SciML.jl Julia ecosystem has substantially packaged the framework for production use.

Physical regularisation

Beyond explicit hybrid architectures, ML models can be physically regularised — soft constraints in the training loss that penalise violations of conservation laws, symmetries, or known asymptotic behaviour. The methodology connects to PINNs (which can be understood as physically-regularised network training) but generalises to broader settings. The empirical pattern is that adding physical regularisation usually improves both in-distribution and out-of-distribution performance, with the cost being slower training and additional hyperparameter tuning for the regularisation weights.

The interpretability win

A subtle benefit of hybrid models is interpretability. When a pure black-box ML model produces an answer, the path from input to output is opaque; when a hybrid model produces an answer, the physics explains most of it and the ML correction is small and inspectable. The result is a model that scientists trust more, regulators evaluate more easily, and engineers can debug when deployment surfaces edge cases. The methodology of effective hybrid modelling pays substantial attention to which subset of the system to model with physics and which to leave to ML, and the resulting designs typically allocate the well-understood parts to physics and the residual mysteries to learning.

Symmetries, Equivariance, and Conservation

Physical systems are characterised by their symmetries — translational, rotational, gauge, time-reversal — and these symmetries imply conservation laws by Noether's theorem. ML methods that respect these symmetries are demonstrably more data-efficient and more reliable than those that do not. The methodology of equivariant neural networks is among the most-mathematically-rigorous corners of modern ML.

Group equivariance

A function f is equivariant under a group G if applying a group element to the input commutes with applying it to the output: f(g·x) = g·f(x). For physical applications, the relevant groups are typically the Euclidean group E(3) (translations and rotations), the special orthogonal group SO(3) (rotations), and various subgroups appropriate to specific symmetries (cubic for crystal lattices, gauge groups for particle physics). Equivariant networks restrict the architecture to operations that respect the group structure, with the result that learned models transform correctly under group elements they have never seen during training — substantial generalisation benefits in low-data regimes.

Convolution as equivariance

Modern equivariance-in-ML traces back to the observation that convolution is equivariant under translation: a translated image's convolution is the original convolution translated. This is why CNNs work for image classification: small translations of an object don't change its class. The 2017–2024 wave of equivariance research generalises this to richer groups: SE(3)-equivariant networks for 3D molecular and materials problems, spherical CNNs for spherical-domain problems (weather, astronomy), gauge-equivariant networks for lattice gauge theory.

SE(3)-equivariant networks for molecules

The most-impactful application has been molecular and materials applications. Methods like Tensor Field Networks, SE(3)-Transformers, NequIP, MACE, and Allegro use SE(3)-equivariant architectures to predict molecular properties (energy, forces, stress) with substantially better data efficiency than non-equivariant baselines. The methodology has displaced earlier non-equivariant methods (SchNet, PhysNet) for most production molecular-property-prediction applications, and is the methodological core of the modern ML interatomic potential ecosystem (Section 14 of Part XV develops this further).

Hamiltonian and Lagrangian neural networks

For dynamical systems with conserved quantities (energy, angular momentum, the various canonical invariants of mechanics), Hamiltonian neural networks (Greydanus, Dzamba & Yosinski 2019) and Lagrangian neural networks (Cranmer et al. 2020) embed the conservation structure directly into the network architecture. Rather than learning the dynamics directly, these methods learn a Hamiltonian or Lagrangian function and derive the dynamics from it via Hamilton's or Euler-Lagrange equations. The result is exact energy conservation (modulo numerical-integration error) by construction, with substantial empirical wins on long-time simulation problems where energy conservation is essential.

The methodological tax

Equivariant networks are not free. The architectures are more complex than their non-equivariant counterparts, the implementations require expert care, and the training is often slower per step. The empirical case is that for low-data, high-stakes scientific problems the data efficiency gains substantially outweigh the engineering cost; for high-data problems with well-mixed training distributions the gains diminish. The methodology of effective deployment requires honest assessment of which regime applies, and the literature has begun to formalise this through scaling-law analyses of equivariant vs. non-equivariant architectures.

Uncertainty, Validation, and Benchmarks

Scientific ML predictions feed into engineering decisions, scientific discoveries, and policy choices where uncertainty matters. The methodology of rigorous uncertainty quantification, careful validation, and benchmark-driven empirical assessment is what separates production-grade scientific ML from research demonstrations.

Epistemic and aleatoric uncertainty

The standard decomposition of uncertainty has two components. Aleatoric uncertainty reflects irreducible randomness in the system — measurement noise, intrinsic stochasticity, the unmodelled chaos that no amount of data can eliminate. Epistemic uncertainty reflects model-and-data uncertainty — what we don't know that more data could reveal. The two have very different operational implications: high aleatoric uncertainty means more measurements won't help, whereas high epistemic uncertainty means they will. Production deployments need to distinguish them, and the methodology connects to the Bayesian-deep-learning material of Part XIII Ch 07.

Methods for UQ in scientific ML

Several methods are in widespread use. Bayesian neural networks (variational inference, Hamiltonian Monte Carlo, the various Laplace approximations) provide principled uncertainty estimates but at substantial computational cost. Deep ensembles (training multiple networks with different random seeds) provide cheap, well-calibrated uncertainty estimates and have become the default in production. Conformal prediction provides distribution-free prediction intervals with guaranteed coverage and has become increasingly popular for scientific applications. Gaussian processes remain the gold standard for low-data regimes where the principled-uncertainty case dominates.

Out-of-distribution detection

A specific UQ challenge for scientific ML is out-of-distribution detection: knowing when the model is being asked to predict outside its training regime. The methodology connects to the broader OOD-detection literature (Part VI Ch 09 on robustness) but with scientific-specific tools. Asymptotic-limit checks (does the prediction match analytical solutions in known limits?), conservation-law residuals (does the prediction satisfy the underlying physics?), and extrapolation diagnostics (how far is the query point from the training data in physically-meaningful coordinates?) are the standard tools, and production deployments routinely combine them.

Benchmarks and reproducibility

Several public benchmarks anchor empirical assessment in the field. PDEBench (Takamoto et al. 2022) provides standardised PDE problems for neural-operator evaluation. The MD17 dataset and its successors (rMD17, ANI-1x) provide standard molecular-dynamics benchmarks for ML interatomic potentials. WeatherBench (Rasp et al. 2020, expanded subsequently) provides standard atmospheric-forecasting benchmarks. Lab in the Loop and the various scientific-discovery benchmarks evaluate end-to-end SciML workflows. The methodology of careful benchmark-driven research has substantially matured the field, and 2024–2026 publications routinely include results across multiple standardised benchmarks.

The reproducibility problem

Scientific ML inherits the reproducibility challenges of mainstream ML and adds its own. Simulator versions (which numerical-physics package, which version, which solver settings) shape the training data. Training-data generation is often expensive enough that not every paper releases it. Hyperparameter sensitivity is high in scientific settings, with substantial differences between reasonable choices. The 2023–2026 wave of scientific-ML-reproducibility infrastructure (the SciML.jl ecosystem's reproducibility tooling, the various benchmark-with-released-data efforts) has helped, but the field's reproducibility track record remains uneven.

Software, Workflows, and the Frontier

The previous sections developed the methods of scientific ML; this final section turns to the software ecosystems that implement them, the workflows of practical deployment, and the open frontiers that will shape the field's next several years.

The software ecosystem

Scientific ML in 2026 is a multi-language, multi-framework ecosystem. JAX (Google) is the dominant SciML platform for gradient-based methods, with composable transforms and excellent autodiff support. PyTorch (Meta) dominates neural-operator research and the molecular-dynamics ML literature, supported by ecosystem libraries (e3nn for equivariance, PyTorch Geometric for GNNs, NequIP and MACE for molecular potentials). NVIDIA Modulus provides a unified production platform for PINNs, neural operators, and differentiable simulation. SciML.jl (Julia) provides the most-comprehensive differentiable-equations ecosystem, with strengths in stiff systems and universal differential equations. Firedrake and Dolfin-Adjoint serve the finite-element-with-adjoints community. The plurality is real and reflects different communities' different trade-offs.

Practical deployment workflows

Production scientific-ML deployments tend to follow a common pattern. (1) Start with a high-fidelity physics-based simulator as the gold standard. (2) Use it to generate training data over a parameter regime of interest. (3) Train an ML surrogate (neural operator, PINN, or hybrid model). (4) Validate the surrogate against held-out simulator runs and against physical reasoning (asymptotic limits, conservation properties). (5) Deploy the surrogate for fast iteration (design optimisation, uncertainty quantification, real-time control), with the original simulator available as a fallback for verification. (6) Monitor the deployment for distribution shift; retrain or extend the training data when the fast surrogate is being asked to operate outside its trained regime.

Foundation models for science

A specific frontier worth flagging: scientific foundation models — large pretrained models on heterogeneous scientific data, fine-tuned to specific applications. Examples include weather and climate foundation models (Aurora from Microsoft, the various 2024–2026 successors), molecular foundation models (the NequIP-derived "universal" potentials, the MACE-OFF series), biological-sequence foundation models (Evo, the various successors), and increasingly cross-domain models that span multiple scientific disciplines. The methodology connects to the broader foundation-model wave but with specific scientific constraints (physical priors, data efficiency, validation rigour) shaping the architectures.

AI-driven scientific discovery

The most-ambitious application of scientific ML is closing the loop on autonomous scientific discovery: AI systems that propose experiments, analyse results, refine hypotheses, and iterate. Early demonstrations exist — DeepMind's AlphaFold for structural biology, Microsoft's AI4Science for materials, the various autonomous-laboratory deployments at materials-research labs — but the methodology is genuinely early. The 2024–2026 wave of LLM-augmented scientific reasoning (the various scientific-research-assistant LLMs, the OpenAI and Anthropic deployments at major research institutions) suggests the closed loop is an active frontier rather than a solved problem.

What this chapter does not cover

Several adjacent areas are out of scope. The substantial probabilistic-programming literature (Stan, Pyro, NumPyro, Turing.jl) intersects scientific ML for Bayesian-inference applications but is conventionally treated through statistical-inference rather than ML lenses. Symbolic computation and computer algebra (SymPy, Mathematica) provides essential infrastructure for scientific work but is its own discipline. The interface between scientific ML and traditional statistical learning theory (the various uniform-convergence results, generalisation-bound analyses) is an active research area but mostly outside the chapter's practical scope. Finally, the deeper philosophical questions — when does an ML-discovered equation count as a "law of nature," what is the proper epistemic status of a black-box predictor in scientific work, how does AI-assisted discovery change the sociology of scientific practice — are essential context that the chapter touches only briefly. The methodology developed here is the practical machinery of doing computational science with modern ML; the broader questions are taken up in the domain-specific chapters that follow.