Physics & AI, where deep learning meets the most-precise empirical science humans have built.
Physics is the framework that underwrites the rest of science, and it has been an unusually-early and substantial adopter of machine learning. Particle-physics experiments at the LHC have used boosted decision trees for event selection since the early 2000s and have moved through deep neural networks, graph neural networks, and transformers as each generation has matured. Lattice QCD calculations now routinely use ML for parameter sampling, autocorrelation reduction, and observable estimation. Plasma-physics fusion-control experiments at JET, DIII-D, and TCV have used reinforcement learning to manage tokamak plasma shape and stability. Neural quantum states have become a standard tool for variational ground-state calculations in many-body physics. Symbolic regression has been used to rediscover physics equations from data and is increasingly deployed for hypothesis generation. This chapter develops both the working physics vocabulary an AI reader needs (Sections 2–9: classical mechanics, electromagnetism, thermodynamics & statistical mechanics, special and general relativity, quantum mechanics, QFT and the Standard Model, modern particle physics) and the AI methodology that has reshaped the field (Sections 10–19). Section 10 is the bridge that orients an ML practitioner to the physics-AI landscape.
Prerequisites & orientation
This chapter assumes mathematical maturity at the level of multivariable calculus, linear algebra, and basic differential equations. The vocabulary half (Sections 2–9) is at undergraduate-introductory level in physics; readers without prior physics coursework can skim the technical details and focus on the conceptual structure (the symmetries, the conservation laws, the characteristic scales). The methodology half (Sections 11–19) assumes the working machinery of modern deep learning (Part VI on transformers and CNNs), the graph-neural-network material of Part XIII Ch 05 (essential for particle physics in §11), the diffusion-model material of Part X (essential for §12 detector simulation), the equivariance methodology of Ch 01 Section 8 (the substrate for §18 foundation models and recurring throughout), and the reinforcement-learning material of Part VII (essential for §14 plasma control). The Fourier-neural-operator and physics-informed-network material of Ch 01 (Scientific ML) is the immediate methodological substrate for §16.
Three threads run through the chapter. The first is the precision-physics standard: physics is the science where some quantities are known to ten decimal places, and AI methods that engage with physics are routinely held to standards of empirical rigour that other AI domains rarely meet. The methodology has had to develop substantial machinery around uncertainty quantification, systematic-error analysis, and reproducibility. The second is the simulation-and-analysis loop: physics produces enormous quantities of simulated data (Monte Carlo for particle physics, lattice sweeps for QCD, Markov chain Monte Carlo for many-body systems) alongside experimental data, and AI methods are deployed at every stage of the simulate-analyse pipeline. The third is the symmetry-driven architecture philosophy: by Noether's theorem, every continuous symmetry corresponds to a conserved quantity, and physics has been the most-active testbed for equivariant neural networks that respect those symmetries by construction. Section 10 is the bridge that frames these themes; they appear in passing throughout.
Why Physics, and Why Physics-AI
Physics is the framework that underwrites the rest of science. Its concepts and methods — energy, momentum, fields, symmetry, quantum amplitudes, statistical ensembles — are the substrate from which chemistry, materials science, biology, and the engineering disciplines are built. Modern physics is also one of the most demanding empirical sciences, with experimental measurements that agree with theoretical predictions to twelve decimal places (the electron magnetic moment) and instruments that detect spacetime distortions of one part in 10²¹ (LIGO). For an AI reader, physics matters because it is the source of the symmetry principles (group equivariance) that increasingly structure modern deep learning, the source of the conservation laws that constrain physical-system dynamics that ML emulators must respect, and a prolific application domain in its own right — particle-physics ML at the LHC, lattice QCD, neural quantum states, plasma control, physics-informed neural networks. This chapter develops both the working physics vocabulary an AI reader needs (Sections 2–9) and the AI methodology that has reshaped the field (Sections 10–19). Section 10 frames what makes physics-AI methodologically distinctive from an ML perspective; this section maps the physics itself.
The unification view
The most useful framing of physics for an AI reader is the unification view: contemporary physics rests on two foundational frameworks, and most of the rest of the discipline is the systematic application of these to particular regimes. Quantum field theory (QFT) — the relativistic synthesis of quantum mechanics and special relativity — describes the strong, weak, and electromagnetic forces, and its predictions for particle physics are the most-precisely tested in human history. General relativity (GR) — the geometric theory of gravity — describes spacetime and cosmology, and its predictions for gravitational waves, black-hole mergers, and the expansion history of the universe have been spectacularly confirmed since 2015. The unfinished project is to unify these two frameworks; the empirical questions that remain open (dark matter, dark energy, neutrino masses, the matter-antimatter asymmetry, the strong-CP problem, gravitational quantisation) are the major frontier of fundamental physics.
The scale-and-symmetry framing
Physics organises its phenomena by characteristic length and energy scales, with symmetries (Lorentz invariance, gauge symmetry, general covariance) constraining the form of theories at every scale. Different regimes use different theories: classical mechanics (~m, ~eV), electromagnetism (~m down to ~Å), thermodynamics and statistical mechanics (~10²³ particles, equilibrium), quantum mechanics (~Å to ~nm, atomic), QFT and particle physics (~fm, ~GeV-TeV), general relativity and cosmology (~Mpc and beyond). Bridging across scales — using ML methods to connect ab-initio quantum theory with macroscopic phenomenology — is one of the most-active methodological frontiers, developed concretely in the AI sections that follow.
Why this is one chapter, not two
The vocabulary and the methods are tightly intertwined. Lorentz-equivariant networks (Section 11) only make sense once Lorentz invariance and four-momentum conservation are understood (Section 5). Neural quantum states (Section 15) only make sense once the many-body Schrödinger equation is understood (Section 7). Lattice QCD ML methods (Section 13) only make sense once gauge theory and the strong interaction are understood (Section 8). Physics-informed neural networks (Section 16) only make sense once partial differential equations and conservation laws are understood (Sections 2–4). Reading just the AI half without the vocabulary leaves an AI practitioner unable to evaluate methodological choices; reading just the vocabulary leaves a physicist unaware of how the field is being reshaped. The 19-section structure is therefore deliberate: §2–9 develop the vocabulary, §10 bridges to the methodology, §11–19 develop the methods.
Classical Mechanics
Classical mechanics is the framework Newton built in the 17th century to describe how objects move under forces. It is the oldest branch of physics, the most-thoroughly-validated, and the conceptual substrate for almost everything that follows. For an AI reader, it provides three things: the basic vocabulary of motion (position, velocity, acceleration, force, energy, momentum), the Lagrangian and Hamiltonian formulations that recur throughout modern physics and machine learning, and the empirical reality of deterministic chaos that shapes what predictions are even possible.
Newton's laws
The foundational laws, formulated by Isaac Newton in the Principia (1687):
First law (inertia): an object at rest stays at rest, and an object in uniform motion stays in uniform motion, unless acted on by a net external force. The first law defines the concept of an inertial frame (one in which objects without external forces move uniformly); inertial frames are the natural setting for the rest of mechanics. Second law: F = ma — the net force on an object equals its mass times its acceleration, or equivalently, force equals the rate of change of momentum (F = dp/dt). The second law connects forces to motion and is the equation of motion for classical mechanics. Third law: for every action there is an equal and opposite reaction. The third law ensures momentum conservation in any closed system.
Energy, work, and momentum
The conserved quantities of classical mechanics are central. Kinetic energy is the energy of motion: KE = ½mv² for a particle of mass m moving at speed v. Potential energy is energy stored in a configuration: gravitational potential energy PE = mgh for an object at height h, elastic potential energy PE = ½kx² for a spring with spring constant k stretched by displacement x, etc. The work-energy theorem states that the work done on an object equals its change in kinetic energy. Conservation of energy: in a closed system without dissipation, total energy (kinetic plus potential) is conserved. Conservation of momentum: in a closed system without external forces, total momentum (mass times velocity, summed over all particles) is conserved. Conservation of angular momentum: in a closed system without external torques, total angular momentum is conserved. By Noether's theorem, these three conservation laws follow from time, space, and rotation symmetries respectively.
Lagrangian mechanics
Newton's F = ma is the most-direct formulation, but it is not the most-elegant. The Lagrangian formulation (Joseph-Louis Lagrange, 1788) reformulates classical mechanics in terms of a single scalar function — the Lagrangian L = KE − PE — and the principle that the actual motion of a system is the one that extremises the action S = ∫ L dt integrated over the trajectory. The resulting Euler-Lagrange equations d(∂L/∂q̇)/dt − ∂L/∂q = 0 reduce to F = ma for the simple cases but generalise much more cleanly to systems with constraints, generalised coordinates, and field theories. The principle of least action is one of the most-powerful unifying ideas in physics, recurring throughout quantum mechanics (Feynman's path integral), field theory, and general relativity.
Hamiltonian mechanics
The Hamiltonian formulation (William Rowan Hamilton, 1833) takes a different route. The Hamiltonian H = KE + PE represents total energy expressed as a function of generalised coordinates q and conjugate momenta p. The equations of motion become dq/dt = ∂H/∂p, dp/dt = −∂H/∂q — a pair of first-order equations with substantial structural advantages. Hamiltonian flow preserves volume in phase space (Liouville's theorem); canonical transformations preserve the Hamiltonian structure; symplectic geometry is the mathematical framework that makes all of this rigorous. The Hamiltonian formulation is the bridge from classical to quantum mechanics (Section 7) — quantum mechanics is essentially classical Hamiltonian mechanics with operators replacing classical observables — and recurs in machine-learning theory through Hamiltonian Monte Carlo, energy-based models, and Hamiltonian neural networks.
Many-body and chaos
The simplest classical-mechanics problems (one particle in a fixed potential, two particles interacting) admit closed-form solutions. The three-body problem — three gravitating masses — does not, except for special configurations. For systems with many interacting particles, the dynamics is generically chaotic: solutions exist and are deterministic, but tiny changes in initial conditions produce exponentially-growing differences in long-time behaviour. Chaos was first systematically studied by Henri Poincaré (1890s) on the three-body problem; Edward Lorenz (1963) gave the modern formulation in the context of atmospheric dynamics (the Lorenz attractor). Chaotic systems have positive Lyapunov exponents (small perturbations grow exponentially), strange attractors (fractal geometric structures in phase space), and fundamentally limited long-term predictability. The atmosphere's ~14-day weather predictability limit (Ch 06 §11) is a chaos consequence; AI weather forecasting works precisely within this chaotic-but-statistically-predictable regime.
Continuum mechanics
When systems contain too many particles to track individually, continuum mechanics takes over: instead of tracking particles, we track fields (density, velocity, temperature, stress) defined at every point in space. Fluid dynamics uses the Navier-Stokes equations to describe how fluids flow under pressure gradients, viscous forces, and external influences. Elasticity describes how solids deform under stress. Plasma physics describes ionised gases under combined electromagnetic and fluid dynamics. The methodology is conceptually layered on top of classical particle mechanics — the field equations can be derived from Newton's laws by averaging over many particles — but the working theory is field-based. Modern computational fluid dynamics (CFD), the climate models of Ch 06, and most of plasma physics are continuum-mechanics applications.
Electromagnetism and Optics
Electromagnetism is the second great classical theory, unifying electricity and magnetism into a single framework. The Maxwell equations are arguably the most-important set of equations in physics — they describe all electric, magnetic, and optical phenomena, predict the existence of electromagnetic waves, and were the historical springboard for special relativity.
Charges, fields, and forces
The basic objects of electromagnetism are electric charges (positive or negative, conserved, quantised in units of the elementary charge e ≈ 1.6 × 10⁻¹⁹ C) and currents (flowing charges, measured in amperes A = C/s). The basic dynamical objects are fields: the electric field E at every point in space tells us the force per unit charge a stationary charge would feel there; the magnetic field B tells us the force per unit charge per unit velocity a moving charge would feel. Fields are real physical entities — they carry energy, momentum, and angular momentum, even in the absence of charges. The total force on a charge is the Lorentz force: F = q(E + v × B).
Maxwell's equations
The Maxwell equations (synthesised by James Clerk Maxwell in 1865, building on Gauss, Faraday, and Ampère) are the four equations governing electromagnetic fields:
Gauss's law: ∇·E = ρ/ε₀ — the divergence of the electric field is proportional to the charge density. No magnetic monopoles: ∇·B = 0 — magnetic field lines never end (no isolated north or south poles exist). Faraday's law: ∇×E = −∂B/∂t — a changing magnetic field produces a circulating electric field (the principle behind electrical generators and transformers). Ampère-Maxwell law: ∇×B = μ₀J + μ₀ε₀ ∂E/∂t — currents and changing electric fields produce circulating magnetic fields. Together, these four equations completely describe classical electromagnetic phenomena.
Electromagnetic waves and light
Maxwell's substantial conceptual contribution was recognising that the equations admit wave solutions: changing electric fields produce changing magnetic fields produce changing electric fields, and the resulting electromagnetic wave propagates through space at speed c = 1/√(μ₀ε₀) ≈ 3 × 10⁸ m/s. Maxwell calculated this speed from electromagnetic measurements alone and recognised it as the speed of light, concluding (correctly) that light is an electromagnetic wave. The full electromagnetic spectrum runs from radio waves (long wavelength, low frequency) through microwaves, infrared, visible light, ultraviolet, X-rays, and gamma rays (short wavelength, high frequency). Modern technology — wireless communications, radar, X-ray imaging, fibre-optic networks, photovoltaics — is essentially applied electromagnetism.
Optics
The behaviour of light at scales much larger than its wavelength is described by geometric optics: light travels in straight lines, reflects off mirrors at predictable angles, refracts when crossing material boundaries (Snell's law), and forms images through lenses according to well-understood geometric rules. At scales comparable to the wavelength, wave optics takes over: diffraction (light bending around obstacles), interference (light waves combining constructively or destructively), and polarisation (the orientation of the electric-field vector) all become observable. Modern optical instruments — microscopes, telescopes, spectrometers — combine geometric and wave optics. The diffraction limit (the smallest features that can be optically resolved) is approximately the wavelength of light, which is why electron microscopy (using electron waves with much shorter wavelengths than visible light) achieves higher resolution than light microscopy.
Materials and electromagnetism
How materials respond to electromagnetic fields determines their electrical and optical properties. Conductors (metals) have free electrons that respond to applied fields, producing currents and screening the interior from external fields. Insulators (dielectrics) have bound electrons that polarise in applied fields without conducting. Semiconductors (silicon, germanium) sit between, with conductivity that can be dramatically modified by impurities (doping) — the basis of all modern electronics. Magnetic materials have permanent or induced magnetic moments that respond to magnetic fields. The full story of how materials behave electromagnetically requires quantum mechanics (Section 7) for the electronic structure, but the macroscopic phenomenology — Ohm's law, capacitance, inductance, magnetisation — is classical.
The gauge structure
A specific structural feature of electromagnetism worth flagging: the fields E and B can be derived from a scalar potential φ and a vector potential A through E = −∇φ − ∂A/∂t and B = ∇×A. The potentials are not unique — adding the gradient of any function to A and subtracting the time derivative of the same function from φ leaves E and B unchanged. This gauge freedom is one of the deepest concepts in modern physics; it generalises to the gauge symmetries of the Standard Model (Section 8) and underlies the entire framework of quantum field theory. From the AI perspective, gauge symmetries motivate equivariant network architectures (Ch 01 §8) and recur throughout physics-informed neural networks.
Thermodynamics and Statistical Mechanics
Thermodynamics describes how heat and energy flow through macroscopic systems; statistical mechanics derives thermodynamics from the microscopic behaviour of huge numbers of particles. Together they constitute the deepest connection between microscopic dynamics and macroscopic behaviour, and the conceptual machinery — partition functions, free energies, entropy, phase transitions — recurs throughout modern physics, chemistry, biology, and machine learning.
The laws of thermodynamics
Four fundamental laws structure the entire discipline. The zeroth law: if two systems are each in thermal equilibrium with a third, they are in thermal equilibrium with each other. This defines temperature as a useful concept. The first law: energy is conserved — heat added to a system either raises its internal energy or does work on the environment (dU = δQ − δW). The first law is the energy-conservation law of Section 2 applied to thermal systems. The second law: the entropy of an isolated system never decreases. Entropy is a measure of disorder or, more precisely, the number of microscopic configurations consistent with a given macroscopic state. The second law is the deepest of the four — it gives time its arrow, makes perpetual motion impossible, and underlies why heat flows from hot to cold rather than the other way. The third law: as temperature approaches absolute zero, entropy approaches zero (or, more precisely, a constant minimum value).
Entropy and the Boltzmann formula
The most-important quantity in thermodynamics-and-statistical-mechanics is entropy. Macroscopically, entropy was originally defined by Clausius as dS = δQ/T for reversible processes. Microscopically, Ludwig Boltzmann gave the deeper formula: S = k_B ln Ω, where Ω is the number of microstates consistent with a given macrostate and k_B ≈ 1.38 × 10⁻²³ J/K is the Boltzmann constant. The formula connects microscopic counting (how many ways can the molecules be arranged?) to macroscopic thermodynamics (how does the system respond to heat?). Modern entropy generalises in many directions: information entropy (Shannon), entanglement entropy (quantum), various non-equilibrium entropies. From an AI perspective, the Boltzmann formula is the foundational result that makes statistical-mechanics-and-machine-learning analogies productive.
Partition functions and ensembles
The central calculational tool is the partition function: Z = Σ_states exp(−E/k_BT), summing over all possible states weighted by the Boltzmann factor. The probability of finding the system in state s is P(s) = exp(−E_s/k_BT) / Z. Once Z is known, all thermodynamic quantities follow: free energy F = −k_BT ln Z, average energy ⟨E⟩ = −∂ ln Z / ∂β, entropy, heat capacity, etc. Ensembles formalise different types of contact between system and environment: the microcanonical ensemble (isolated, fixed energy), the canonical ensemble (in contact with a heat bath at fixed temperature), the grand canonical ensemble (with both energy and particle exchange). Each has its own partition function and free energy. The methodology is foundational to chemistry (Ch 02 §6), physics, materials science, and most of biology (where free-energy minimisation drives protein folding, ligand binding, and many other processes).
Free energy and chemical potential
The free energy generalises energy in the presence of entropy. The Helmholtz free energy F = U − TS is the appropriate quantity for systems at constant temperature and volume; the Gibbs free energy G = U − TS + PV is the right quantity at constant temperature and pressure (most chemical and biological situations). The principle of free-energy minimisation drives almost every spontaneous process at constant temperature: a system in contact with a heat bath at temperature T evolves toward minimum free energy, balancing energy minimisation (favoured at low T) against entropy maximisation (favoured at high T). The chemical potential μ generalises this further: μ_i = ∂G/∂N_i is the free-energy cost of adding one particle of species i. Chemical equilibrium occurs when the chemical potentials of products and reactants balance.
Phase transitions and critical phenomena
One of the most-distinctive features of statistical mechanics is the phase transition: at certain temperatures (the boiling point of water, the Curie point of a magnet, the superconducting transition temperature), bulk properties change discontinuously. The classical examples are first-order transitions (water-to-ice, water-to-steam) where a latent heat is exchanged, and second-order transitions where derivatives of free energy diverge but the free energy itself is continuous. Critical phenomena at second-order transitions exhibit universal behaviour — different physical systems share the same critical exponents — captured by the renormalisation group (Kenneth Wilson, 1971). The methodology has had enormous influence beyond physics: phase transitions and critical behaviour have been identified in neural networks (memorisation-vs-generalisation transitions), machine learning training dynamics, and various other AI-relevant settings.
Connection to information theory
Thermodynamic entropy and Shannon's information entropy are formally identical: both measure the logarithm of the number of microstates (or the average number of bits needed to specify a state). The connection is not just formal — it underlies Maxwell's demon arguments, Landauer's principle (erasing one bit of information dissipates at least k_BT ln 2 of energy), and the modern field of information thermodynamics. From an AI perspective, the connection means that the rich machinery of statistical mechanics — partition functions, free energies, mean-field approximations, variational methods — transfers directly to information-theoretic problems and to machine learning theory.
Special Relativity
Special relativity, formulated by Einstein in 1905, replaces Newton's framework when speeds approach the speed of light. The theory is built on two postulates and produces consequences that are deeply counterintuitive but experimentally verified to extraordinary precision: time dilation, length contraction, simultaneity is relative, and mass-energy equivalence.
The two postulates
Einstein's foundational postulates: (1) the principle of relativity — the laws of physics are the same in all inertial reference frames; (2) the constancy of the speed of light — the speed of light in vacuum is the same in all inertial frames, independent of the motion of source or observer. The first postulate generalises a Galilean intuition (mechanics looks the same in all uniformly-moving frames). The second is the radical move: it requires abandoning the Newtonian notion of absolute time and absolute simultaneity. Together, the two postulates imply the Lorentz transformations for how space and time coordinates transform between inertial frames.
The Lorentz transformations
For two frames moving relative to each other at velocity v along the x-axis, the coordinates transform as:
x' = γ(x − vt)
y' = y
z' = z
where γ = 1/√(1 − v²/c²) is the Lorentz factor.
For small v ≪ c, γ ≈ 1 and the transformations reduce to Galilean (Newtonian) transformations. For v approaching c, γ diverges, and the differences become pronounced. The key consequences: time dilation (moving clocks run slow by a factor γ), length contraction (moving objects appear shorter along the direction of motion by a factor γ), and relative simultaneity (events simultaneous in one frame are not simultaneous in another). All three are experimentally verified to extraordinary precision through atomic clocks, particle accelerators, and GPS timing (which would not work without relativistic corrections).
Spacetime and Minkowski geometry
Hermann Minkowski (1908) recognised that special relativity is most-elegantly expressed by treating time and space as a unified four-dimensional spacetime. Events are points in spacetime with coordinates (ct, x, y, z), and the geometric structure is the Minkowski metric with signature (+,−,−,−) (or equivalently (−,+,+,+) depending on convention). The spacetime interval ds² = c²dt² − dx² − dy² − dz² is invariant under Lorentz transformations; this invariant generalises the Euclidean distance to spacetime. Light cones separate spacetime into past, future, and elsewhere — the boundaries are the worldlines of light signals, which travel at exactly c. Causality is encoded in the light-cone structure: only events within the past light cone can have caused a given event, and only events in the future light cone can be caused by it.
Energy, momentum, and E = mc²
The most-famous consequence of special relativity is mass-energy equivalence: E = mc². More precisely, the relativistic energy-momentum relation is E² = (pc)² + (mc²)², where m is the rest mass (the mass measured in a frame where the particle is at rest). For a particle at rest, p = 0, giving E = mc². For a massless particle (like a photon), m = 0, giving E = pc. The relativistic generalisations of energy and momentum become a four-vector (E/c, p) that transforms cleanly under Lorentz transformations. Mass-energy equivalence has substantial empirical consequences: nuclear reactions release energy by converting small amounts of mass to kinetic energy (the basis of nuclear power and weapons); particle-antiparticle annihilation converts all the mass to photon energy; pair production creates particle-antiparticle pairs from photon energy when it's energetic enough.
Why special relativity matters for AI
For most AI applications, special relativity is in the background — it matters for particle-physics applications (§11) where particles move at near-light speeds, for cosmological applications, and for GPS-related systems. But the conceptual structure recurs in machine learning theory: causal structure in the light-cone sense organises causal-inference methods (Part XIII Ch 04), Lorentz-equivariant networks appear in particle-physics ML, and the Minkowski metric appears in some attention mechanisms designed for handling temporal and spatial relationships simultaneously. The deeper conceptual lesson — that the right framework can resolve apparent paradoxes that resist incremental refinements of the wrong framework — recurs throughout science and AI methodology.
General Relativity
General relativity, Einstein's 1915 theory of gravity, generalises special relativity to include gravitational interactions. The theory is conceptually radical — gravity is not a force but the curvature of spacetime — and quantitatively spectacular, predicting effects (gravitational lensing, gravitational waves, black holes, the expansion of the universe) that have been verified in succession over a century.
The equivalence principle
Einstein's foundational insight was the equivalence principle: gravitational mass and inertial mass are exactly equal, which means that local effects of a gravitational field are indistinguishable from those of acceleration. An observer in a closed elevator cannot tell whether the elevator is at rest in a gravitational field or accelerating uniformly in deep space. This radical observation implies that gravity is not really a force in the Newtonian sense — it is a property of spacetime itself.
Curved spacetime
Einstein's mature theory replaces Newton's gravitational force with curved spacetime: massive objects warp the geometry of spacetime around them, and free-falling objects follow geodesics (straightest possible paths) through the curved geometry. The curvature is described by the metric tensor g_μν, generalising the Minkowski metric of flat spacetime. The dynamics is encoded in the Einstein field equations: G_μν = (8πG/c⁴) T_μν, relating the curvature of spacetime (the Einstein tensor G_μν, computed from g_μν and its derivatives) to the energy-momentum content of matter (the stress-energy tensor T_μν). The equations are highly non-linear; closed-form solutions exist only for systems with substantial symmetry.
Schwarzschild and black holes
The first non-trivial solution, found by Karl Schwarzschild in 1916, describes the spacetime around a non-rotating spherical mass. The Schwarzschild solution reduces to Newton's gravity at large distances but predicts qualitatively new phenomena at small distances: the existence of an event horizon (a surface inside which nothing can escape, even light) at the Schwarzschild radius r_s = 2GM/c². For stellar-mass objects, r_s is small (about 3 km for the Sun); for galactic-centre supermassive black holes, r_s is comparable to planetary-orbit scales. The 2019 Event Horizon Telescope image of M87's central black hole, and the 2022 image of Sagittarius A* at the centre of the Milky Way, are direct confirmations of the Schwarzschild geometry. Rotating black holes are described by the more-complex Kerr solution (Roy Kerr, 1963).
Cosmology and the expanding universe
General relativity applied to the universe as a whole produces cosmological models. The Friedmann-Lemaître-Robertson-Walker (FLRW) metric describes a homogeneous, isotropic universe; the Einstein equations applied to it give the Friedmann equations governing how the cosmic scale factor a(t) evolves with time. The empirical case for cosmic expansion has accumulated since Hubble's 1929 redshift-distance observations. The 1998 discovery of cosmic acceleration (using Type Ia supernovae as standard candles) implied the existence of dark energy — about 70% of the universe's energy budget. Combined with cold dark matter (about 25%), ordinary matter is only ~5% of the cosmic content. The ΛCDM cosmological model integrates these ingredients into the working framework of modern cosmology, with major observational tests (the cosmic microwave background, baryon acoustic oscillations, large-scale structure, supernova distances) all consistent.
Gravitational waves
General relativity predicts gravitational waves: ripples in spacetime that propagate at the speed of light, produced when massive objects accelerate asymmetrically. The first direct detection (LIGO, 2015) measured the merger of two black holes ~1.3 billion light-years away, with the spacetime distortion at Earth being smaller than the diameter of a proton. Subsequent detections of black-hole and neutron-star mergers have made gravitational-wave astronomy a routine observational tool. AI methods process the noisy LIGO/Virgo/KAGRA data streams to detect signals, classify event types, and infer source parameters — Ch 12 will develop the methodology.
The frontier and ML applications
General relativity remains incomplete in important ways: it is not unified with quantum mechanics, the singularities at black-hole interiors and the Big Bang remain unresolved, and the dark-matter and dark-energy components are empirically established but theoretically unexplained. AI methods touch general relativity in several ways: N-body simulations of cosmological structure use ML emulators to bridge resolution scales; strong gravitational lensing produces complex image data analysed by ML methods; gravitational-wave detection deploys CNNs and transformers to flag signals in noisy detector streams; black-hole imaging (Event Horizon Telescope) uses ML methods for sparse-aperture image reconstruction. Most of this material lives in Ch 12 (AI for Astronomy & Astrophysics); this section establishes the GR vocabulary that makes those methods intelligible.
Quantum Mechanics
Quantum mechanics is the framework for physics at atomic and subatomic scales. It is conceptually different from classical physics in fundamental ways: predictions are inherently probabilistic, observable quantities are operators rather than numbers, and measurement disturbs the system being measured. The framework has been verified to extraordinary precision and underlies essentially all of chemistry (Ch 02), all of materials science, and most of modern electronics.
The wavefunction and Schrödinger's equation
The state of a quantum system is described by a wavefunction ψ — a complex-valued function on configuration space whose modulus squared |ψ|² gives the probability density of finding the system in a given configuration upon measurement. The wavefunction evolves in time according to the Schrödinger equation: iℏ ∂ψ/∂t = Ĥψ, where Ĥ is the Hamiltonian operator (generalising the classical Hamiltonian of Section 2) and ℏ ≈ 1.05 × 10⁻³⁴ J·s is the reduced Planck constant. The Schrödinger equation is linear and deterministic — given an initial wavefunction, future wavefunctions are uniquely determined.
Operators, observables, and eigenstates
Physical observables — energy, position, momentum, angular momentum — correspond to Hermitian operators acting on the wavefunction. The possible values of a measurement of an observable are the eigenvalues of the corresponding operator; immediately after measurement, the system is in the corresponding eigenstate. The position and momentum operators do not commute ([x̂, p̂] = iℏ), which is the mathematical statement of Heisenberg's uncertainty principle: Δx · Δp ≥ ℏ/2. Position and momentum cannot both be sharply defined simultaneously. Energy eigenstates are particularly important: a system in an energy eigenstate has definite energy and oscillates in time only by a global phase. The eigenstates of the Hamiltonian form a complete basis, and arbitrary wavefunctions can be expanded as superpositions of energy eigenstates.
The hydrogen atom and atomic structure
The simplest quantum-mechanical bound state is the hydrogen atom: an electron in the Coulomb potential of a proton. Solving the Schrödinger equation produces discrete energy levels E_n = −13.6 eV / n² (the Rydberg formula) and characteristic orbital wavefunctions labelled by quantum numbers (n principal, ℓ angular, m magnetic, s spin). The orbital structure (1s, 2s, 2p, 3s, 3p, 3d, ...) generalises to multi-electron atoms with corrections from electron-electron interactions, giving the periodic table its structure (Ch 02 §2). Atomic spectra — the discrete frequencies of light absorbed and emitted by atoms — are direct measurements of the energy-level structure, and the success of quantum mechanics in reproducing them was the original empirical case for the theory.
Spin and identical particles
Spin is an intrinsic angular momentum of quantum particles, with no classical analogue. Electrons, protons, and neutrons have spin-½ (taking values +ℏ/2 or −ℏ/2 along any chosen axis); photons have spin 1; the various other particles have their own spin values. Identical particles in quantum mechanics come in two classes: fermions (half-integer spin: electrons, protons, neutrons, quarks) obey the Pauli exclusion principle — no two identical fermions can occupy the same quantum state — which is why electrons fill atomic orbitals in characteristic patterns and matter doesn't all collapse into the lowest-energy state; bosons (integer spin: photons, mesons, the Higgs boson) can occupy the same state in unlimited numbers, which is why photons can produce coherent laser beams and certain materials at low temperature undergo Bose-Einstein condensation.
Entanglement and measurement
The most-distinctive feature of quantum mechanics is entanglement: two or more quantum systems can be in a joint state that cannot be expressed as a product of individual states. A measurement on one entangled particle instantaneously determines the corresponding measurement outcome on the others, regardless of distance. The phenomenon was famously discussed in the EPR paper (Einstein, Podolsky, Rosen, 1935) and made empirically testable through Bell's theorem (1964): correlations predicted by quantum mechanics violate inequalities that any local-hidden-variables theory must satisfy. Experimental tests since the 1980s have confirmed the quantum predictions to high precision (the 2022 Nobel Prize in Physics recognised Aspect, Clauser, and Zeilinger for the foundational experiments). Entanglement is a real physical phenomenon and is the basis of quantum computing, quantum cryptography, and quantum sensing.
The measurement problem
A specific philosophical-and-empirical issue worth flagging: the Schrödinger equation describes deterministic, unitary evolution of the wavefunction; measurement appears to produce non-deterministic, non-unitary "collapse" of the wavefunction to an eigenstate of the measured observable. The relationship between unitary evolution and measurement collapse is the measurement problem, and the various interpretations of quantum mechanics (Copenhagen, many-worlds, decoherence, Bohmian, the various others) differ in how they resolve it. For most practical purposes, the standard "shut up and calculate" approach is fine: apply the Schrödinger equation between measurements, apply the Born rule (probabilities equal modulus-squared of amplitudes) at measurements, and extract empirical predictions. The deeper questions remain open and have substantial implications for foundations of probability and information theory.
Why quantum mechanics matters for AI
Several connections matter. Quantum chemistry (Ch 02 §8) is the application of quantum mechanics to molecular systems; AI methods that engage with chemistry routinely use quantum-chemistry calculations as training data or evaluation targets. Quantum machine learning is a substantial sub-discipline exploring whether quantum computers can train models, perform inference, or sample distributions faster than classical computers; the field is technically nascent but has substantial theoretical results. Quantum-inspired classical methods use ideas from quantum mechanics (tensor networks, variational methods) to improve classical ML algorithms. Materials and condensed-matter AI (Ch 09–10) increasingly use ML to predict properties of quantum-mechanical systems where direct calculation is intractable. Most of this material lives in §11 and beyond; this section establishes the QM vocabulary that makes the methods intelligible.
Quantum Field Theory and the Standard Model
Quantum field theory unifies quantum mechanics and special relativity, replacing the wavefunction-of-particles framework with quantum fields filling all of spacetime. The Standard Model of particle physics is the culmination of this synthesis — a remarkable, experimentally-verified theory of the elementary particles and their interactions, with one famous gap (gravity) and several known empirical incompleteness.
From quantum mechanics to quantum field theory
Non-relativistic quantum mechanics treats particles as fundamental and fields (potentials) as background. Quantum field theory (QFT) inverts this: fields are fundamental and exist throughout spacetime; particles are localised quantised excitations of those fields. An electron is an excitation of the electron field; a photon is an excitation of the electromagnetic field; the Higgs boson is an excitation of the Higgs field. The framework is mathematically subtle (infinities arise at intermediate steps and require renormalisation) but produces predictions of extraordinary precision — the QED prediction for the electron's anomalous magnetic moment agrees with measurement to better than ten parts per billion.
Feynman diagrams
Calculations in QFT are typically performed using Feynman diagrams — graphical representations of perturbative contributions to scattering amplitudes. Lines represent propagating particles; vertices represent interactions. Each diagram corresponds to a specific mathematical expression, and the total amplitude for a process is the sum over all topologically-distinct diagrams. The methodology gives intuitive interpretations: an electron scattering off another electron exchanges a virtual photon; two electrons annihilating produce a virtual photon that becomes a particle-antiparticle pair; etc. Feynman diagrams are not literal pictures of what happens — they are computational shortcuts — but the visual language has become inseparable from particle physics.
Gauge theories and the Standard Model
The most-important conceptual structure in modern particle physics is gauge symmetry (Yang-Mills theory, 1954). Three of the four known fundamental interactions are gauge theories:
Quantum Electrodynamics (QED): the gauge theory of electromagnetism, with U(1) gauge symmetry; the gauge boson is the photon. Quantum Chromodynamics (QCD): the gauge theory of the strong nuclear force, with SU(3) gauge symmetry (the "colour" group); the gauge bosons are the eight gluons. Electroweak theory: the unified gauge theory of electromagnetism and the weak nuclear force, with SU(2)×U(1) gauge symmetry; the gauge bosons are the photon and the three weak bosons (W⁺, W⁻, Z). The fourth interaction, gravity, is described by general relativity (Section 6) and resists straightforward gauge-theory formulation. The combined gauge group of the Standard Model is SU(3)×SU(2)×U(1), with explicit symmetry breaking via the Higgs mechanism producing the masses we observe.
The Higgs mechanism
A specific puzzle in the Standard Model is that gauge symmetry naively requires gauge bosons to be massless, but the W and Z bosons are observed to be very massive (~80 and ~90 times the proton mass). The Higgs mechanism (Brout, Englert, Higgs, Guralnik, Hagen, Kibble, 1964) resolves this: a scalar field — the Higgs field — pervades all of space with a non-zero vacuum expectation value, breaking the electroweak symmetry and giving mass to the W, Z, and (through a different mechanism) the various fermions. The Higgs field has its own associated particle, the Higgs boson, predicted to exist with specific properties. The 2012 discovery of the Higgs boson at the LHC (mass ~125 GeV/c²) completed the experimental verification of the Standard Model, and Englert and Higgs received the 2013 Nobel Prize.
The particle content
The Standard Model contains a specific list of fundamental particles. Quarks (six flavours: up, down, charm, strange, top, bottom) interact via the strong force, electromagnetic force, and weak force; they bind into hadrons like protons and neutrons. Leptons (six flavours: electron, muon, tau, plus three neutrinos) interact via the electromagnetic force (charged leptons) and the weak force (all leptons). Gauge bosons: photon, W⁺, W⁻, Z, eight gluons. The Higgs boson. Each fermion has a corresponding antiparticle. The Standard Model's predictions for cross-sections, decay rates, and various other observables have been verified at colliders to extraordinary precision.
What the Standard Model doesn't do
Despite its empirical success, the Standard Model is known to be incomplete. Gravity is not part of the Standard Model and cannot be added in a known consistent way (quantum gravity remains an open problem). Dark matter (~25% of the universe's energy content) is not described by any Standard Model particle. Dark energy (~70%) likewise. Neutrino masses (now known to be non-zero) require an extension of the Standard Model. The hierarchy problem: why is the Higgs mass so much smaller than the Planck scale? CP violation and the matter-antimatter asymmetry of the universe: why is there matter at all rather than nothing? These are the major puzzles motivating Beyond the Standard Model (BSM) physics, with proposed extensions including supersymmetry, extra dimensions, axions, and various others. None has been confirmed experimentally as of 2026.
Particle Physics: LHC, Beyond, and Open Questions
Experimental particle physics is the empirical engine of modern fundamental physics. The Large Hadron Collider, the various neutrino-physics experiments, and a substantial collection of cosmological observations together constitute the empirical substrate against which the Standard Model and its proposed extensions are tested.
The Large Hadron Collider
The Large Hadron Collider (LHC) at CERN is the world's highest-energy particle accelerator: a 27 km ring tunnel beneath the France-Switzerland border, accelerating protons (or heavy ions) to ~7 TeV per beam (~14 TeV centre-of-mass energy). The four major experiments — ATLAS, CMS, LHCb, and ALICE — surround interaction points where the proton beams collide ~40 million times per second. The detectors record the resulting particle showers in ~100 million data channels, generating petabytes of raw data per year. The LHC's empirical highlights since 2009 include the 2012 Higgs discovery (its central scientific motivation), precision measurements of Standard Model parameters, searches for supersymmetry and other BSM physics (no clear positive results as of 2026), and exotic-hadron discoveries (tetraquarks, pentaquarks) that test our understanding of QCD bound states.
Detector physics and data
Modern particle detectors are sophisticated multilayer instruments. Tracking detectors (silicon strip and pixel detectors near the interaction point) measure charged-particle trajectories with micron precision. Calorimeters measure particle energies by absorbing and integrating their energy depositions. Muon systems identify muons (which penetrate further than other particles). Trigger systems filter the raw collision rate down to manageable rates for offline analysis (only ~1 in 10⁵ events is recorded). The detector data is high-dimensional (millions of readout channels), sparse (most channels are empty for any given event), and complex (each event is a scattering process described by quantum field theory). AI methods process this data through every stage: trigger selection, particle identification, energy calibration, vertex finding, and physics analysis. ML methods (CNNs for image-like detector responses, GNNs for tracking, transformers for sequence modelling) are increasingly central to the experimental pipeline.
Neutrino physics
Neutrinos are nearly-massless, electrically-neutral leptons that interact only via the weak force, making them notoriously difficult to detect. The neutrino sector has produced some of the most-important discoveries in modern particle physics. Neutrino oscillations — the discovery (Super-Kamiokande, SNO, ~1998–2001) that neutrinos change flavour as they propagate, which requires non-zero neutrino masses — earned the 2015 Nobel Prize. The solar neutrino problem (the deficit of detected solar neutrinos compared to theoretical predictions, established by Davis 1968) was resolved by neutrino oscillations. IceCube, an ice-Cherenkov detector at the South Pole, observes high-energy astrophysical neutrinos and has begun mapping neutrino sources beyond the Sun. The neutrino mass hierarchy, the question of whether neutrinos are their own antiparticles (Majorana vs Dirac), and the magnitude of CP violation in the lepton sector are major open empirical questions.
Dark matter and dark energy
Cosmological observations require dark matter (gravitationally interacting but invisible, ~25% of the universe's energy budget) and dark energy (causing cosmic expansion to accelerate, ~70%). The empirical case is robust — multiple independent observations (galaxy rotation curves, gravitational lensing, the cosmic microwave background, baryon acoustic oscillations, the bullet cluster) require dark matter, and supernova distance measurements plus the CMB require dark energy — but the underlying physics is unknown. Direct-detection experiments (LUX-ZEPLIN, XENONnT, the various others) search for dark-matter particles passing through Earth-based detectors; collider experiments search for dark-matter production. As of 2026, no direct detection has succeeded. The cosmological constant is the simplest explanation for dark energy but raises substantial puzzles about its observed magnitude. The dark sector is the biggest known empirical gap in fundamental physics.
The frontier and future colliders
Several major projects shape the next-decade agenda. The High-Luminosity LHC (HL-LHC, scheduled to begin in ~2030) will increase the LHC's data rate by an order of magnitude, enabling much more-precise Standard-Model measurements and improved BSM searches. Future Circular Collider (FCC, ~100 km ring at CERN) is a proposed next-generation machine targeting ~100 TeV proton-proton collisions. International Linear Collider (ILC) and Compact Linear Collider (CLIC) are proposed e⁺e⁻ machines for precision Higgs and electroweak studies. Beyond colliders, neutrino-physics experiments (DUNE, Hyper-Kamiokande), dark-matter experiments, gravitational-wave observatories (LISA), and cosmological surveys (LSST/Rubin, Euclid) will collectively address the open questions. The empirical landscape of fundamental physics over the next decade will be substantially shaped by these efforts.
AI in particle physics
Particle physics has been an early and substantial adopter of machine learning. Boosted decision trees (BDTs) have been the workhorse classification method for event selection since ~2000. Deep neural networks took over for many tasks since ~2015. Graph neural networks (GNNs) are increasingly central for particle tracking, jet physics, and event reconstruction (the natural representation: detector hits as nodes, candidate tracks/showers as edges). Transformers have been adapted for jet tagging and event-level analysis. Generative models (GANs, diffusion models, normalising flows) accelerate detector simulation, which is a major computational bottleneck for collider analyses. Anomaly detection using ML methods is a major BSM-search frontier — the idea is to find unusual events without specifying what we're looking for. §11–19 develop this methodology in detail.
From Physics to ML: An Orientation
The previous nine sections established the physics vocabulary: classical mechanics, electromagnetism, thermodynamics and statistical mechanics, special and general relativity, quantum mechanics, quantum field theory and the Standard Model, and the empirical landscape of contemporary particle physics. This section is the bridge to the methodology that follows. AI for physics is methodologically distinctive in several ways — the field has the most precise quantitative ground truth of any application area, the symmetry structure of physical theories aligns naturally with equivariant deep learning, the computational bottlenecks (particle physics simulation, lattice QCD, many-body quantum mechanics, plasma control) are exactly where ML methods earn their keep, and physicists have a culture of disciplined empirical evaluation. This section orients the ML practitioner; Sections 11–19 develop the methods within that frame.
The precision-physics standard
Physics is the science where some quantities are known to ten decimal places. The fine-structure constant, the electron's anomalous magnetic moment, the cosmic microwave background's blackbody spectrum, the Planck mass: many of physics's most-important empirical numbers are pinned down with extraordinary precision. The flip side is that physics-grade results are routinely held to standards of empirical rigour that other AI domains rarely meet — systematic-error analysis, calibrated uncertainty quantification, reproducibility across multiple independent collaborations, and convincing demonstrations that results aren't artefacts of the methodology. AI methods deployed in physics have had to develop substantial machinery around these standards: conformal prediction for calibrated uncertainty, domain adaptation with explicit systematic-error budgets, and closure tests that train on simulated data with known answers and verify that the methodology recovers them.
The simulation-and-analysis loop
Physics produces enormous quantities of simulated data alongside experimental data. Monte Carlo simulations for particle physics generate billions of synthetic collision events used for detector calibration, training, and analysis. Lattice QCD sweeps produce gauge-field configurations on space-time lattices for non-perturbative QCD calculations. Markov chain Monte Carlo produces samples from quantum many-body systems for variational calculations. N-body simulations for cosmology track billions of particles across cosmic time. The simulated data is the substrate of much AI training; the experimental data is the substrate of much AI evaluation. The methodology has matured substantially around the simulate-train-deploy-evaluate-improve loop, and the specific challenges (simulation-experiment domain shift, the cost of simulating to high precision, the reuse of legacy simulation infrastructure) shape what AI methods are deployed where.
Symmetry as architectural prior
The deepest organising principle in physics is the connection between symmetries and conservation laws (Noether's theorem). For AI, this provides an unusually clean architectural prior: build symmetries into the network architecture and the resulting model is guaranteed to respect the corresponding conservation laws by construction. The methodology has been most thoroughly developed in particle physics, where Lorentz-equivariant networks respect spacetime symmetries; in molecular and materials physics (Ch 02 §8 and Ch 09–10), where SE(3)-equivariant networks respect rotational and translational symmetries; and in lattice gauge theory, where gauge-equivariant networks respect the local gauge symmetries of the underlying theory. Section 9 develops the equivariant-architecture machinery in detail; the principle recurs throughout the chapter.
The discovery-vs-prediction spectrum
AI methods in physics span a spectrum from prediction within known frameworks (using ML to accelerate Monte Carlo, replace parameterisations, fit functional forms in known equations) to discovery of new physics (using ML to find anomalies in collider data, infer functional forms via symbolic regression, propose new theoretical frameworks). Most production deployments live near the prediction end of the spectrum — the empirical wins there are clearer and the methodology more mature. Discovery-end applications are higher-stakes, less mature, and require substantially-more careful evaluation methodology because the empirical case for "we found new physics" is intrinsically harder to make than the case for "we accelerated a known calculation." Section 10 returns to this spectrum and the methodological maturity at each end.
The data substrate
Physics has produced some of the largest, most-curated scientific datasets ever assembled. CERN's LHC produces ~100 PB/year of raw data, of which ~50 PB is recorded after triggering. The Open Data portals at major experiments (CMS Open Data, ATLAS Open Data) make subsets freely available for analysis and ML benchmarking. Lattice QCD ensembles generated by USQCD, CLS, and other collaborations are increasingly shared. Cosmological N-body simulation outputs (IllustrisTNG, the various others) are in the petabyte range. CMB data from Planck and successors plus the upcoming Simons Observatory and CMB-S4 produce comparable volumes. Each substrate has its own ML applications, and each is the basis of a substantial research community.
Empirical realities and what AI cannot do
A specific tension in AI for physics is between the methodology's empirical successes (faster simulation, better classifiers, more-flexible parameterisations) and its conceptual limitations (no replacement for theoretical understanding, limited extrapolation guarantees, often-opaque internal representations). Physics culture values understanding as much as prediction, and AI methods that produce excellent predictions without understanding sometimes meet community resistance even when the empirical case is strong. The methodology has gradually accommodated this — interpretability work, symbolic regression that produces human-readable equations, hybrid physics-ML approaches that respect known structure — but the underlying tension between predictive accuracy and mechanistic insight remains, and shapes which methods get adopted where.
Precision standards inherited from a discipline that measures things to ten decimal places, simulation-and-analysis pipelines that produce vast labelled training data, symmetry-driven architectural priors, and a culture that values mechanistic understanding alongside predictive accuracy. The methodology in this chapter is shaped by these constraints; the headline architectures (transformers, graph nets, equivariant networks, diffusion models) are familiar from other chapters, but the surrounding evaluation and engineering practice differs substantially.
AI for Particle Physics at the LHC
Particle physics has been an early and substantial AI adopter. Boosted decision trees have been the workhorse classification method since the early 2000s; deep neural networks took over for many tasks since ~2015; graph neural networks and transformers are increasingly central as of 2026. This section develops the methodology across the LHC analysis pipeline.
The trigger and online selection
The LHC produces ~40 million proton-proton collisions per second per interaction point. Recording all of them would require petabytes per second of bandwidth and is impossible — only ~1 in 10⁵ events can be saved. The trigger system performs this selection in real time, deciding which events to record based on rapid pattern recognition. Modern triggers are multi-stage: a hardware Level-1 trigger (FPGA-based, ~microsecond latency) makes a coarse cut based on simple summary information; a software High-Level Trigger (HLT, ~100 ms latency) performs more sophisticated reconstruction and selection. ML methods are increasingly central to both stages. FPGA-deployable neural networks (compressed via quantisation and pruning) run within Level-1 latency budgets and substantially improve selection efficiency for rare-event signatures. CNN-based jet taggers in HLT identify hadronic decay products faster and more accurately than rule-based methods. The 2026 generation of triggers at ATLAS and CMS deploys ML throughout the pipeline.
Tracking with graph neural networks
Particle tracking — reconstructing the trajectories of charged particles from sparse hits in silicon-strip and pixel detectors — is the most-time-consuming step in offline reconstruction. The traditional methodology (Kalman-filter-based combinatorial track finding) scales poorly with detector occupancy, becoming computationally prohibitive at HL-LHC luminosity. Graph Neural Networks (GNNs) have become the dominant modern approach: detector hits are nodes, candidate connections are edges, and a GNN is trained to classify which edges belong to genuine tracks. Exa.TrkX (a HEP-specific tracking GNN, multi-collaboration project ~2020) was an early demonstration; subsequent generations integrate edge classification with track-parameter regression. The methodology produces track-finding pipelines that scale linearly rather than exponentially with hit density, enabling practical reconstruction at HL-LHC and beyond.
Jet physics and tagging
Most LHC events produce jets — collimated sprays of hadrons that result from the fragmentation of high-energy quarks and gluons. The jet's substructure carries information about its progenitor: a jet from a top quark looks different from a jet from a Higgs boson, and both look different from a generic QCD jet. Jet tagging is the ML problem of classifying jets by their progenitor type. The methodology has gone through several generations: jet-image CNNs (treating the jet as a 2D image in η-φ space) were the early dominant approach; particle-cloud architectures (treating the jet as a permutation-invariant set of particles, processed via Deep Sets or Particle Transformer) replaced them as more-rigorous treatments of the underlying physics; Lorentz-equivariant networks (LorentzNet, PELICAN) build the relevant Lorentz symmetries into the architecture. The empirical performance has improved substantially across generations, with current methods reaching ~95% accuracy on top-quark jets vs ~75% for simpler architectures.
Anomaly detection and BSM searches
A specific high-value application is model-agnostic anomaly detection: searching for new physics without specifying what it looks like. The methodology trains a model on Standard-Model-only events (or events selected as "background-like" by traditional methods) and flags events that the model finds anomalous. Autoencoder-based methods reconstruct events through a bottleneck and flag those with high reconstruction error. Density-estimation methods (normalising flows, score-based models) explicitly model the SM event distribution and flag low-likelihood events. Classification-without-labels (CWoLa) trains a classifier to distinguish two mixtures of signal and background, recovering signal sensitivity even without labelled signal events. The methodology is the dominant approach for BSM searches without specific hypotheses; it has been deployed at LHC Olympics challenges and in published ATLAS and CMS analyses since 2021.
Calibration and reweighting
A large class of physics applications involves calibration: adjusting simulated event distributions to better match observed data, or computing systematic-uncertainty corrections. ML methods have substantially improved the methodology. Boosted decision trees have been used for energy calibration of jets and electrons since the early 2000s. Neural-network reweighting (Andreassen et al., the various subsequent methods) uses a classifier-based approach to re-weight one distribution to match another, with applications across detector calibration, theoretical-systematics evaluation, and the simulation-to-data correction step that almost every analysis requires. The methodology is now standard in the LHC analysis pipeline.
Operational deployment
Production deployment of ML at the LHC has its own substantial challenges. Reproducibility: physics results must be reproducible by independent collaborations, which means ML pipelines must be fully documented, version-controlled, and shareable across institutions. Long-term stability: LHC analyses span years; ML models must be evaluated and updated as detector conditions change. Systematic-error treatment: ML predictions inherit biases from training data, and these must be quantified as systematic uncertainties on the final physics results. The methodology has matured around these constraints, and the best-practice frameworks (the various ATLAS and CMS internal "ML for physics analysis" working groups) have produced substantial documentation that propagates through the field.
Detector Simulation and Generative Models
Detector simulation is the largest single computing cost for LHC experiments — ATLAS spends roughly half its computing budget on Monte Carlo. ML-based generative methods have become a major frontier for accelerating this simulation, with implications across particle physics, nuclear physics, and the various other simulation-heavy subfields.
The GEANT4 baseline
The traditional methodology is GEANT4: a particle-by-particle Monte Carlo simulation that tracks each particle through the detector geometry, modelling its interactions with detector materials according to physics-based cross-sections. GEANT4 produces extremely-accurate simulated events but at substantial compute cost — a single LHC event takes seconds to minutes of CPU time, and producing the billions of events needed for analysis requires millions of CPU-hours. The methodology has been the gold standard for thirty years; the empirical accuracy is essential for analyses that depend on subtle detector effects.
Generative-model alternatives
The hope of ML-based fast simulation is to replace GEANT4 with neural-network surrogates that are orders of magnitude faster while maintaining sufficient accuracy. The methodology has gone through several generations. Generative Adversarial Networks (GANs) were the earliest substantial direction (2017–2020): train a generator to produce detector responses (calorimeter showers, tracking-detector hits) that a discriminator cannot distinguish from GEANT4 outputs. The empirical performance was promising but suffered from GAN-typical issues — mode collapse, training instability, and difficulty calibrating uncertainty. Variational Autoencoders (VAEs) provided more-principled probabilistic generation but typically produced over-smooth outputs.
Diffusion models and normalising flows
The 2022–2025 generation has substantially improved on the GAN/VAE methodology. Score-based and diffusion models (the same diffusion methodology that produces images from text prompts and proteins from structures, applied to detector responses) have produced high-fidelity calorimeter-shower generation with more-stable training than GANs. Normalising flows provide exact density estimation and are particularly useful for likelihood-based inference workflows. The CaloChallenge (a community benchmark for fast calorimeter simulation, ongoing since 2022) has anchored quantitative comparisons across methods, and the 2024 generation of submissions includes diffusion-based methods (CaloDiffusion, CaloScore), flow-based methods (CaloFlow, CaloMan), and various hybrids. Production deployment at ATLAS (the AtlFast3 framework integrating ML methods alongside parameterised simulation) and CMS is increasingly central to the experimental infrastructure.
Beyond calorimeters
Detector simulation extends beyond calorimeters. Tracking-detector simulation generates hits from charged-particle interactions with silicon strips and pixels; ML methods accelerate this with comparable methodology to calorimeters. Time-of-flight detectors, muon systems, and the various trigger systems each have their own simulation needs. The methodology is increasingly multi-detector and multi-physics — generating an entire event including all detector subsystems — rather than focused on individual subdetectors. The 2024–2026 wave of "end-to-end" detector simulators is producing the next generation of fast-simulation tools.
The accuracy-vs-speed tradeoff
A persistent challenge is the tradeoff between simulation accuracy and computational speed. GEANT4 is slow but accurate; the simplest ML surrogates are fast but lose subtle features (non-Gaussian tails, fine spatial structure, rare-event configurations) that matter for downstream analyses. The methodology has gradually moved toward conditional simulation — using fast surrogates for the bulk of events, with full GEANT4 for events where high accuracy matters. Refined-fast-simulation hierarchies (where progressively-more-accurate surrogates are used for progressively-more-important events) are increasingly central to production workflows. The empirical case is that ML-based fast simulation can substitute for GEANT4 for the majority of analysis purposes while reserving full simulation for the cases where it matters; the methodology of when to switch and how to combine is its own substantive engineering problem.
Beyond LHC: nuclear and astrophysics
The methodology developed for LHC detector simulation has been adapted for other domains. Nuclear-physics simulation at facilities like Jefferson Lab and the upcoming Electron-Ion Collider uses similar ML-fast-simulation methodology. Astrophysical detector simulation for IceCube neutrino telescopes, Cherenkov-Telescope-Array γ-ray observatories, and direct-detection dark-matter experiments increasingly uses ML surrogates. Cosmological N-body simulation uses ML methods to bridge resolution scales — ML emulators reproducing high-resolution outputs from low-resolution simulations have become a standard tool for cosmology pipelines. The cross-domain methodology has substantial overlap, and tools developed for one application increasingly transfer to others.
Lattice QCD and Computational Hadron Physics
Lattice QCD is the non-perturbative computational framework for the strong nuclear force, and one of the most-compute-intensive scientific calculations humans regularly perform. ML methods have become increasingly central to the methodology, addressing the autocorrelation and sampling-efficiency problems that limit traditional approaches.
The lattice gauge framework
Lattice QCD discretises spacetime onto a four-dimensional grid (typically ~64⁴ to 128⁴ lattice sites), places quark fields at sites and gauge fields on the links between sites, and computes physical observables via Markov chain Monte Carlo sampling of the gauge-field configuration space. The methodology was introduced by Wilson in 1974 and has been continuously refined; it is the only known way to compute QCD observables non-perturbatively, and its results agree with experimental measurements to typically a few percent. The empirical successes include hadron-mass spectra (computed lattice QCD masses agree with PDG values), kaon and B-meson decay constants (essential for testing the Standard Model's flavour sector), and proton spin-and-mass decompositions (revealing how the proton's properties emerge from quark and gluon dynamics).
The autocorrelation problem
The dominant computational challenge of lattice QCD is autocorrelation: successive Monte Carlo samples are highly correlated, and producing statistically-independent configurations requires many MCMC steps between sampled configurations. The problem becomes severe at fine lattice spacings (where the topological structure becomes increasingly fixed during simulation, an issue called topological freezing). Traditional improvements include cluster algorithms (where applicable), various hybrid Monte Carlo methods, and parallel-tempering schemes. ML methods have begun to substantially improve on these.
Normalising flows and exact-likelihood sampling
The most-impactful recent ML development is the use of normalising flows for lattice-QCD configuration generation. The methodology trains an invertible neural network (a flow) to map a simple base distribution to the target gauge-field distribution; once trained, the flow generates statistically-independent samples directly, bypassing the autocorrelation problem entirely. Albergo, Kanwar & Shanahan 2019 demonstrated the methodology for 2D φ⁴ theory; subsequent work extended to U(1) and SU(N) gauge theories at increasingly larger lattice sizes. The 2024–2026 generation is producing flows for 4D QCD-relevant theories, with substantial speedups on small-to-moderate lattices. The methodology is computationally intensive at training time but produces independent samples at inference time — a tradeoff that's worthwhile when many samples are needed.
Neural-network-improved actions
An alternative ML application is improved lattice actions: using neural networks to construct lattice discretisations of the QCD action with smaller systematic errors. The methodology connects to the broader physics-improved-by-ML literature: train a network to add corrections to a baseline action such that the corrected action better reproduces continuum-limit physics at coarse lattice spacings. The empirical case is that ML-improved actions can substantially reduce the lattice-spacing dependence of computed observables, allowing the same physics to be extracted from coarser (and therefore cheaper) lattices.
Observable estimation and disconnected diagrams
Many lattice-QCD observables involve disconnected diagrams — quark-loop contributions where a quark is created and annihilated at the same spacetime point. These are notoriously noisy because they require many random source vectors to estimate. ML methods (the various 2022–2025 papers on noise-reduction by neural networks) train networks to predict the disconnected-diagram contributions from cheaper input observables, substantially reducing the variance for fixed compute. The methodology is increasingly deployed in production lattice-QCD analyses for B-meson and other heavy-quark calculations.
Connection to the Standard Model
Lattice QCD is one of the empirical anchors for the Standard Model. Quark masses, strong coupling at the Z pole, and the CKM matrix elements (which govern flavour-physics phenomenology) all come from lattice-QCD calculations combined with experimental measurements. ML-improved lattice methods directly feed these analyses, with implications for flavour-physics anomalies (the various B-meson discrepancies that have generated discussion since the mid-2010s) and Standard-Model parameter precision. The methodology is increasingly central to the LHC physics programme, where flavour physics provides an indirect window on Beyond-the-Standard-Model physics.
Open challenges
Several open problems shape the methodology. Sign problem: lattice calculations at finite chemical potential (relevant for neutron-star physics and the QCD phase diagram) suffer from a sign problem that makes traditional MCMC fail; ML-based approaches (contour-deformation methods, learned variational ansätze) are an active research direction. Real-time dynamics: lattice calculations are typically Euclidean (imaginary time), and reconstructing real-time dynamics is notoriously difficult; ML methods are exploring new approaches. Multi-baryon physics (deuteron, helium nuclei) is computationally prohibitive at present and is a major target for next-generation methods. The methodology is rapidly evolving, and the 2026–2030 timescale is likely to see substantial advances on each of these fronts.
Plasma Physics and Fusion Control
Plasma physics — the physics of ionised gases, dominated by electromagnetic interactions — is a substantial subfield of physics with major engineering applications, particularly fusion energy. The 2022 Nature paper from DeepMind on tokamak control was a watershed moment: reinforcement learning successfully managed the plasma in a real fusion device. The methodology has substantial implications for fusion-energy research and beyond.
Tokamaks and the fusion challenge
Practical fusion power requires confining hot (~150 million K) plasma in stable configurations long enough for fusion reactions to occur. Tokamaks use magnetic fields in a torus geometry to confine the plasma; the field configuration must be carefully tuned to stabilise various plasma instabilities. ITER (under construction in France, first plasma scheduled for ~2034) is the largest tokamak in development, designed to produce ~500 MW of fusion power for tens of seconds — a substantial step toward fusion-power demonstration. Tokamak control involves managing dozens of magnetic-coil currents to maintain plasma shape, position, current, and stability against various instability modes. The traditional methodology uses hand-tuned PID controllers; ML methods have begun to substantially improve on this.
Reinforcement learning for plasma control
The watershed result is Degrave et al. 2022 (DeepMind & EPFL, Nature): a reinforcement-learning agent trained in a high-fidelity tokamak simulator and deployed on the TCV (Variable Configuration Tokamak) at EPFL, where it autonomously controlled the plasma through various target configurations including diverted, snowflake, and droplet shapes. The methodology used a deep deterministic policy-gradient agent; training in simulation took ~3 days; transfer to the real tokamak required minimal additional fine-tuning. The result demonstrated that RL methods could control real plasmas at production-relevant timescales — a substantial advance over the traditional hand-tuned controllers. Subsequent work has extended the methodology to DIII-D at General Atomics, JET at Culham, and other major machines.
Disruption prediction and avoidance
Plasma disruptions — abrupt loss of plasma confinement that can damage the reactor wall — are the most-feared failure mode in tokamak operation. ITER will have hard limits on the number of disruptions that can be tolerated over the lifetime of the device. ML methods are increasingly central to disruption prediction: training a model to forecast disruptions seconds to minutes in advance from real-time plasma diagnostics, enabling operators (or automated systems) to take avoidance actions. The methodology has been demonstrated on JET and DIII-D with substantially better lead time than traditional threshold-based methods. Disruption avoidance is the natural extension: rather than just predicting disruptions, train the controller to actively steer away from dangerous regimes. The 2024–2026 generation of methods integrates prediction, avoidance, and control into unified frameworks.
Scenario design and optimisation
Beyond real-time control, ML methods are deployed for scenario design: choosing the time-dependent control trajectories (heating power, plasma current, magnetic field) that produce desired plasma states. Traditional methods use slow optimisation through expensive plasma simulators; ML methods accelerate this through learned simulators (faster than first-principles plasma codes) and surrogate-based optimisation. The methodology produces scenarios that meet performance targets while satisfying engineering constraints (heat flux limits, mechanical stress, etc.). For ITER-class reactors, where each experimental run is enormously expensive, the ability to pre-optimise scenarios in simulation has substantial economic value.
Beyond tokamaks: stellarators and inertial confinement
The methodology extends beyond tokamaks. Stellarators (helically-twisted devices that don't require a plasma current, with the Wendelstein 7-X at IPP being the major modern example) have their own optimisation problems amenable to ML methods. Inertial-confinement fusion (ICF, exemplified by the 2022 NIF achievement of ignition) uses laser-driven implosion of fuel pellets; ML methods optimise laser-pulse shapes, target designs, and post-shot data analysis. Z-pinch and spherical-tokamak approaches each have their own ML applications. The combined methodology is producing the next generation of fusion-research tools, and the operational deployment is substantial across the major fusion programmes.
Plasma physics beyond fusion
Plasma physics has applications well beyond fusion. Astrophysical plasmas (the solar corona, accretion disks, the interstellar medium) require similar ML methodology for analysis and control. Industrial plasmas for semiconductor manufacturing, surface treatment, and arc welding have substantial optimisation problems. Space-weather plasmas (the magnetosphere, the solar wind) have implications for satellite operations and ground-infrastructure protection. The methodology of ML-for-plasma is increasingly cross-domain, with tools developed for fusion transferring to other plasma-physics applications.
Neural Quantum States and Many-Body Physics
Many-body quantum systems — interacting electrons in solids, ultracold atoms in optical lattices, spin systems on lattices — are the subject of condensed-matter physics and a substantial fraction of modern computational physics. Neural quantum states (NQS) have become a major variational tool for these systems, complementing traditional methods like exact diagonalisation, density-matrix renormalisation group (DMRG), and quantum Monte Carlo.
The many-body problem
The fundamental challenge of quantum many-body physics is the curse of dimensionality: the Hilbert space of N quantum particles grows exponentially in N, making direct calculation infeasible beyond ~20–30 particles even on supercomputers. Traditional methods address this through various approximations — mean-field theory (assuming weakly-interacting particles), density-functional theory (Ch 02 §8, replacing many-body wavefunctions with density functionals), tensor-network methods (DMRG and successors, exploiting low-entanglement structure), and Monte Carlo methods (sampling from the wavefunction probability distribution). Each has its regime of validity; combined, they address most of the practical problems in condensed-matter physics, but substantial gaps remain — particularly for strongly-correlated electron systems with frustrated interactions.
The neural-network ansatz
An ansatz (German for "approach" or "starting point"; pl. ansätze) is a parameterised functional form proposed as a candidate solution to a problem, with the parameters tuned to fit data or satisfy physical constraints. The term has been standard physics vocabulary for over a century — Hartree-Fock and BCS superconductivity each rest on famous ansätze for the relevant wavefunction — and it predates neural networks by decades. What's new with neural quantum states is the choice of functional form: instead of a hand-designed expression with a small number of physically-motivated parameters, the wavefunction is represented as a neural network with potentially millions of parameters.
Neural quantum states (NQS) represent the many-body wavefunction this way: ψ(s) = NN_θ(s), where s is a many-body configuration and NN_θ is a neural network with parameters θ. The methodology was introduced by Carleo & Troyer 2017 (Science), who used Restricted Boltzmann Machines to represent ground states of various spin systems and demonstrated competitive results with traditional methods. The key insight: a sufficiently-expressive neural network can represent quantum states that resist traditional ansätze, including states with substantial entanglement that DMRG handles poorly in 2D and 3D. Subsequent work has extended NQS to fermionic systems (where the wavefunction must be antisymmetric under particle exchange), continuous-space systems, and increasingly large systems where traditional methods become intractable.
Variational Monte Carlo with NQS
The standard methodology is variational Monte Carlo (VMC) with a neural-network ansatz. The energy expectation value is computed by sampling configurations from the wavefunction probability density and averaging the local energy; the network parameters are optimised to minimise the energy. The methodology has been extended substantially since 2017. FermiNet (Pfau et al. 2020, DeepMind) introduced antisymmetric neural-network ansätze for electronic-structure problems, achieving chemical-accuracy results on small molecules. PauliNet (Hermann et al. 2020) used a related approach with explicit antisymmetrisation. DeepHF (Cassella et al. 2023) extends to larger systems with hardware-efficient implementations. The methodology is increasingly competitive with state-of-the-art coupled-cluster and quantum-Monte-Carlo methods on systems of moderate size.
Architecture choices
Several architectural choices recur across NQS methods. Restricted Boltzmann Machines (RBMs) were the original architecture and remain useful for spin systems with simple symmetries. Convolutional neural networks exploit translational symmetry on lattices. Recurrent neural networks (and modern variants like RNNs with attention) handle long-range correlations. Group-equivariant networks respect the symmetries of the underlying Hamiltonian (translation, rotation, particle exchange). Graph neural networks handle molecular geometry. The 2024–2026 generation increasingly uses transformer-style architectures with explicit symmetry handling. Each architectural choice has tradeoffs in expressiveness, computational cost, and ease of optimisation.
Applications and benchmarks
NQS methodology has been applied across many physics subdomains. Spin systems: the original application, with benchmarks on the Heisenberg and Hubbard models and frustrated systems. Electronic structure: ab-initio calculations on small molecules and increasingly on larger systems. Lattice gauge theory: NQS methods provide alternative approaches to lattice-QCD problems where traditional MCMC has difficulties. Cold-atom systems: ultracold atoms in optical lattices realise model Hamiltonians (Hubbard, Heisenberg) experimentally, providing direct comparison with NQS predictions. The community-standard benchmarks are increasingly mature, and the empirical case for NQS as a competitive method (rather than a research curiosity) has substantially solidified.
The NetKet ecosystem
A specific software ecosystem worth knowing: NetKet (originally Carleo, Choo, et al. ~2018) is the dominant open-source library for neural-quantum-state methods, with extensive functionality for variational Monte Carlo, spectrum calculations, dynamics simulation, and the various specialised needs of NQS research. The library has substantially professionalised the field — methods are routinely benchmarked using NetKet, papers cite NetKet implementations, and reproducibility is much better than in many ML-for-physics subfields. The ecosystem represents the kind of community infrastructure that distinguishes mature ML-for-science applications from research-only domains.
Open frontiers
Several methodological frontiers shape the field. Excited-state calculations: most NQS methodology is for ground states; calculating excited states is harder but increasingly tractable. Real-time dynamics: simulating how a quantum many-body system evolves in time is computationally demanding; NQS-based methods are being extended in this direction. Symmetry-broken phases: ground states with spontaneously-broken symmetries (superconductors, magnets) require careful handling of the broken symmetry. Finite-temperature methods: extending NQS to thermal states is technically distinct from ground-state methodology. The frontier is substantial, and the 2026–2030 timescale is likely to see continued advance.
Physics-Informed Neural Networks and PDE Solving
Physics-informed neural networks (PINNs) and the broader neural-operator framework provide ML-based approaches to solving partial differential equations — the backbone of essentially all of physics. The methodology has matured substantially since 2019 and has produced practical tools across fluid dynamics, electromagnetism, quantum mechanics, and general relativity.
The PINN framework
Physics-informed neural networks (Raissi, Perdikaris & Karniadakis 2019, JCP) represent the solution u(x,t) of a PDE as a neural network u_θ(x,t) and train the network by minimising a loss function combining a data fitting term (predictions match measurements at sample points) with a physics residual term (the PDE is satisfied at collocation points throughout the domain). The PDE residual is computed via automatic differentiation: differentiate the network with respect to its inputs to compute spatial and temporal derivatives, plug them into the PDE, and penalise non-zero residuals. The methodology naturally handles forward problems (solve the PDE given equations and boundary conditions), inverse problems (infer parameters from data), and various combinations.
Strengths and weaknesses
PINNs have specific strengths over traditional PDE solvers. Mesh-free: no grid generation required, the network produces a continuous solution everywhere. Easy multi-physics: combining multiple PDEs in a coupled system is straightforward — just add residual terms to the loss. Inverse-problem natural: parameter inference is just adding physical-parameter-discovery terms to the optimisation. High-dimensional: PINNs handle problems where traditional grid methods scale prohibitively (high-dimensional Schrödinger equations, Black-Scholes in many dimensions). The weaknesses are equally specific. Training stability: the multi-objective optimisation can be hard, with the data and physics losses potentially conflicting. Compute cost: evaluating PDE residuals requires automatic differentiation through the network, which is expensive for deep networks and high-order PDEs. Stiff problems: PINNs struggle with stiff ODEs/PDEs, multi-scale problems, and singular boundary layers. The methodology continues to mature with substantial work on each of these challenges.
Neural operators
A related but architecturally-different framework is neural operator learning: rather than learning a single PDE solution, learn a map from input fields (initial conditions, boundary conditions, source terms, material properties) to solution fields. Fourier Neural Operators (FNOs, Li et al. 2020) compute kernel-integral operators in spectral space; DeepONets (Lu et al. 2021) factorise the operator into a "branch" network (encoding the input function) and a "trunk" network (encoding the evaluation point). Once trained, neural operators produce solutions for new instances of the parameter family in milliseconds — orders of magnitude faster than traditional solvers. The methodology has been particularly successful for problems where many similar PDE instances need to be solved (climate emulation as in Ch 06 §14, materials property prediction, optimisation under PDE constraints).
Hybrid physics-ML methods
The frontier increasingly mixes physics and ML. Differentiable simulators implement traditional PDE solvers in differentiable frameworks (JAX, PyTorch), enabling gradient-based optimisation through the simulation. Hybrid solvers use ML to learn corrections to traditional solver outputs, addressing systematic errors while retaining the traditional solver's stability. Neural-network closures use ML to parameterise unresolved processes within traditional solvers (the climate-model parameterisation methodology of Ch 06 §15 is the canonical example). The methodology represents a synthesis of physics-grounded structure with ML flexibility, and is increasingly the practical answer to the physics-vs-data tension.
Specific application domains
PINNs and neural operators are deployed across many physics applications. Fluid dynamics: turbulent-flow simulation, blood-flow modelling, weather and climate (Ch 06). Electromagnetism: antenna design, photonic-device optimisation, electromagnetic-scattering calculations. Quantum mechanics: solving the Schrödinger equation in low-dimensional systems, computing scattering cross-sections. General relativity: black-hole mergers, cosmological perturbation theory. Plasma physics: tokamak equilibrium calculations (complementing the RL methodology of Section 5). Each domain has its own specialisations, but the underlying machinery is shared.
The Operator Learning Bench infrastructure
A specific community-infrastructure development worth knowing: PDEBench, PDEArena, and various other benchmark suites have provided standardised evaluation for PDE-solving ML methods since ~2022. The empirical question of "which method works best for which problem" has substantially matured, with consensus emerging that Fourier Neural Operators dominate for periodic problems, DeepONets for problems requiring varying input function spaces, and PINNs for problems with sparse data and known physics. The methodology continues to evolve, but the empirical landscape is much better-organised than five years ago.
Symbolic Regression and Physics Discovery
Symbolic regression — discovering closed-form mathematical equations from data — is the most-direct AI approach to physics discovery. The methodology has roots in genetic-programming work from the 1990s and has been substantially advanced by recent neural-symbolic methods. For physics, the methodology aspires to do what scientists have always done: extract concise mathematical descriptions from empirical observations.
The classical methodology
The classical approach is genetic programming: represent equations as expression trees, evolve populations of trees through mutation and recombination, select for fit-to-data plus parsimony, and repeat until convergence. The methodology was developed in the 1990s (Koza and others) and produces interpretable equations but suffers from substantial scalability issues — the search space of expressions grows combinatorially with equation complexity. Eureqa (Schmidt & Lipson 2009) was an early influential implementation; PySR (Cranmer 2023) is the modern open-source standard with substantial empirical refinements. PySR has been used in dozens of physics-discovery applications and is the practical entry point for symbolic regression in physics.
AI Feynman and the rediscovery test
AI Feynman (Udrescu & Tegmark 2020, Science Advances) was a major step in modernising the methodology. The system uses physics-inspired heuristics — dimensional analysis (constraining exponents in fundamental equations), symmetry analysis (identifying invariant transformations), separability (factoring multi-variable functions), and simple-equation tests — to guide the search. The empirical demonstration: the system rediscovered essentially all the equations in the Feynman Lectures on Physics from numerical data alone, with ~100% success on classical physics and substantial success on quantum mechanics. The methodology established the credibility of modern symbolic regression as a serious physics-discovery tool.
Neural-symbolic hybrids
The 2022–2026 generation has substantially advanced the methodology through neural-symbolic hybrids: combining neural networks (which fit data flexibly) with symbolic-regression backends (which produce interpretable equations). SymbolicNet trains networks with symbolic-friendly activation functions and extracts equations after training. DeepSymRegression uses transformer architectures to predict expression trees directly. The 2024 generation includes large-language-model approaches that use pretrained LLMs to suggest candidate expressions, ranked by data fit. The methodology has become substantially more scalable than pure genetic programming.
What symbolic regression discovers (and doesn't)
The empirical track record is instructive. Symbolic regression reliably rediscovers known equations from synthetic data in the appropriate functional form. It produces novel functional forms that fit data and that physicists then interpret physically — sometimes with insight, sometimes not. It is less successful at discovering genuinely new physics: the methodology can identify patterns in data, but interpreting those patterns as new fundamental laws requires human judgment that current ML methods cannot replicate. The honest framing is that symbolic regression is a hypothesis-generation tool: it produces candidate equations that human physicists evaluate for physical reasonableness. The combined human-AI workflow has been productive across multiple domains.
Applications across physics
Symbolic regression has been applied across many physics subdomains. Astrophysics: discovering empirical scaling relations in galaxy properties. Materials physics: finding closed-form approximations for complex many-body energies. Fluid dynamics: extracting closure relations for turbulence parameterisations. Particle physics: identifying functional forms for physics observables that depend on multiple variables. Plasma physics: characterising dependencies in complex plasma scaling laws. The methodology is increasingly part of the standard toolkit, with PySR widely adopted across these subfields.
The interpretability dimension
A specific virtue of symbolic regression in physics is interpretability: the output is a closed-form equation that physicists can read, manipulate, derive properties of, and integrate with established theory. This contrasts with neural-network solutions that are accurate but opaque. The interpretability dimension matters substantially in physics, where the goal is often understanding rather than prediction — a closed-form equation that can be related to first principles is more valuable than an opaque function that fits the data slightly better. The methodology represents an attempt to build AI tools that fit the cultural standards of a discipline that values mechanistic insight alongside predictive accuracy.
Open frontiers
Several open problems shape the methodology. Discovering equations with non-obvious functional structure: most current methods explore expression trees over standard mathematical operations, which limits what can be discovered. Multi-equation systems: discovering coupled systems of equations rather than single equations is substantially harder. Combining symbolic regression with first-principles theory: using known physics constraints to guide the search rather than searching blindly is an active research direction. Hypothesis evaluation: deciding whether a discovered equation represents real physics versus data artefacts requires methodology beyond data fit. The frontier is substantial, and the field is gradually maturing toward more-rigorous evaluation standards.
Foundation Models and Equivariant Architectures
The 2024–2026 wave of foundation-model and equivariant-network research has substantially shaped AI for physics. This section surveys the architectural principles that recur across the chapter and the foundation-model efforts that aspire to general-purpose physics AI.
Geometric deep learning
The unifying framework for symmetry-aware architectures is geometric deep learning (Bronstein, Bruna, Cohen, Veličković 2021): the systematic approach to building neural networks that respect specified symmetry groups by construction. The framework includes translation-equivariant CNNs (the original deep-learning success on grids), permutation-equivariant networks for sets and graphs, rotation-equivariant networks for 3D structures, gauge-equivariant networks for fields on manifolds, and various other specialisations. Each respects a specific symmetry by construction, which both improves data efficiency and provides extrapolation guarantees that arbitrary networks cannot match. The methodology is the architectural backbone for much modern AI for physics.
Lorentz-equivariant networks for particle physics
The most-developed equivariant architecture for particle physics is Lorentz-equivariant networks. LorentzNet (Gong et al. 2022) was the early demonstration: a graph neural network with Lorentz-invariant message passing, achieving substantial improvements over non-equivariant alternatives on jet-tagging benchmarks. PELICAN (Bogatskiy et al. 2022) extended the methodology with permutation- and Lorentz-equivariant attention. LGNs and the various 2024–2026 successors continue the architectural development. The empirical case is that Lorentz-equivariance produces meaningful generalisation improvements on collider-physics tasks, particularly when training data is limited.
Gauge-equivariant networks for lattice physics
Lattice gauge theory has its own specific symmetries — local gauge transformations on the lattice — that constrain the relevant networks. Gauge-equivariant neural networks (Favoni, Tanaka, Klein, Kanwar, Shanahan 2022 and others) build the lattice gauge symmetries into the architecture. The methodology has been applied to normalising-flow methods for lattice configuration generation, ML-improved lattice actions, and direct learning of physical observables. The empirical case is similar to the Lorentz-equivariant case: the architectures produce meaningful improvements on the benchmarks that matter, with theoretical guarantees about respecting the underlying physics.
Multi-modal physics foundation models
Several 2024–2026 efforts aim at physics foundation models: large pretrained networks that can be fine-tuned for many downstream physics tasks. The methodology is less mature than for language or vision (and considerably less mature than for protein structure), but several efforts are notable. PhysicsLM and the various ML-for-physics-LLM efforts use language models trained on physics literature for problem-solving and reasoning. Universal force fields for materials science (M3GNet, MACE, the various 2024 successors covered in Ch 09–10) function as physics foundation models in the materials domain. Multi-scale physics foundation models (Aurora-style architectures generalised to multiple physics domains) are an emerging direction. Whether the foundation-model paradigm proves out for physics as it has for language remains an open empirical question.
Tensor networks and ML
A specific architectural connection worth flagging: tensor networks — the decomposition of high-dimensional tensors into networks of low-dimensional tensors — have substantial overlap with neural-network architectures. The connection has been most thoroughly explored in many-body physics (Section 6), where tensor networks like matrix-product states (MPS) and projected entangled-pair states (PEPS) are the workhorse traditional methods. The 2022–2025 wave of work has substantially clarified the connections: certain neural-network ansätze are equivalent to tensor networks of various types, and tensor-network insights (entanglement entropy, area laws, contractibility) inform what kinds of physics problems specific neural-network architectures can efficiently represent. The methodology is increasingly cross-pollinated.
The architectural-prior philosophy
The deeper lesson of equivariant networks for AI broadly is that strong architectural priors can substantially improve data efficiency and generalisation when the priors are correct. Physics provides unusually clean priors (symmetries are deeply established, conservation laws are exact), making the empirical case for equivariant architectures unusually strong. Other domains have softer priors that may or may not transfer (chemistry's similarity-symmetry, biology's evolutionary structure, language's compositionality), and the methodology of identifying productive priors is itself a substantial research area. AI for physics has been the most-systematic testbed for the architectural-prior philosophy, with implications across the broader machine-learning landscape.
The Frontier and the Operational Questions
AI for physics has matured substantially over the past five years. This final section surveys the frontier — the open methodological problems, the active research directions, and the operational questions that will shape the field over the next several years.
Discovery vs prediction, restated
The discovery-vs-prediction spectrum (Section 1) shapes the methodological frontier. Prediction-end methodology is mature: AI-accelerated detector simulation, ML-improved lattice QCD, NQS for many-body ground states, and PINNs for PDE solving are all in active production use, with empirical track records that compete favourably with traditional methods. Discovery-end methodology is less mature: anomaly detection at the LHC has not yet found new physics; symbolic regression has not produced a fundamentally-new physics equation; foundation models have not made conceptual breakthroughs comparable to AlphaFold's. The 2026–2030 timescale is likely to see continued maturation at both ends, but the discovery-end successes are intrinsically harder to engineer and harder to evaluate. Whether AI methods will produce qualitatively-new physics insights, or will remain primarily a methodology accelerator within human-led discovery, is the open question.
Uncertainty quantification at physics-grade precision
The precision-physics standard (Section 1) has shaped methodology around uncertainty quantification. Standard ML uncertainty methods (ensembles, dropout-based Bayesian inference, conformal prediction) often don't produce calibrated uncertainty at the precision physics expects. Physics-specific uncertainty methods are an active research area: closure tests with simulated data, systematic-error budgets that propagate through the ML pipeline, and bootstrap-based methods that capture both statistical and systematic contributions. The methodology is gradually maturing, but uncertainty-quantification rigour remains a substantial gap between AI-for-physics methods and the demands of precision-physics applications.
BSM physics and out-of-distribution behaviour
A specific frontier question is how AI methods behave in out-of-distribution regimes — particularly in BSM-physics searches where, by definition, the relevant signal is not in the training data. The empirical evidence is mixed. Anomaly-detection methods can find unusual events in collider data but distinguishing genuine BSM signals from systematic effects is hard. Foundation models trained on Standard-Model physics may or may not extrapolate reliably to BSM regimes; existing tests suggest mixed results. The methodology of training on the known and reliably extrapolating to the unknown is an open problem, and one with substantial implications for whether AI methods can contribute meaningfully to discoveries beyond the Standard Model.
Quantum computing and AI
The intersection of quantum computing and AI is increasingly active. Quantum machine learning aspires to use quantum computers for ML tasks (training, inference, sampling). Classical-AI for quantum systems uses ML to help simulate, control, and benchmark quantum hardware (Sections 6's NQS methodology is part of this). Hybrid classical-quantum algorithms integrate ML into variational quantum algorithms (VQE, QAOA) for various applications. The 2026 state of quantum hardware (~1,000 noisy qubits) limits what's currently feasible, but the field is moving rapidly, and the methodology is increasingly integrated with mainstream AI for physics.
Open data and reproducibility
A specific operational question shaping the field is open data and reproducibility. Particle physics has substantial open-data initiatives (CMS Open Data, ATLAS Open Data) that enable academic ML researchers to engage with collider physics without internal-collaboration access. Lattice QCD has shared ensembles that enable independent ML methodology development. Plasma-physics data is increasingly shared. Each open-data initiative has substantially democratised the field and accelerated methodology development. The 2026–2030 timescale is likely to see continued expansion, with implications for how ML for physics develops as a community discipline rather than a series of internal-collaboration efforts.
Workforce and culture
A non-technical frontier issue is workforce and culture. The integration of ML into physics has required substantial cultural adaptation: traditional physics training does not include modern ML; traditional ML training does not include physics. The major collaborations have invested in cross-training, with substantial dedicated ML positions and educational programmes. The cultural integration is producing a generation of physicists who are also competent ML practitioners, and ML practitioners who understand physics. The methodology of how the cross-training scales — whether the future generation of physicists will routinely engage with ML, or whether ML will remain a specialist discipline within physics — is an open organisational question that will shape the field for decades.
What this chapter does not cover
Several adjacent areas are out of scope. The substantial AI-for-cosmology and AI-for-astronomy methodology (gravitational-wave detection, galaxy classification, photometric-redshift estimation, exoplanet detection) is properly the domain of Ch 11–12 and is touched only briefly here. Materials physics applications — AI for predicting material properties, generative crystal design, ML interatomic potentials — are in Ch 09–10. Quantum-chemistry applications — DFT corrections, molecular property prediction — are in Ch 02 and Ch 03. The substantial high-energy theory literature on AI for string theory, holography, and the various other foundational-physics topics is largely skipped. The substantial applied-physics literature (acoustics, optics, condensed-matter device physics) is barely touched. The chapter focused on the methodological core of AI for fundamental physics; the broader landscape of AI in physics is genuinely vast.
Further reading
A combined library spanning physics fundamentals (Feynman Lectures, Goldstein, Jackson, Carroll GR, Griffiths QM, Peskin & Schroeder QFT, the Particle Data Group review) and the AI methodology that has reshaped the field (Carleo & Troyer NQS, Raissi PINNs, Degrave tokamak RL, Udrescu & Tegmark AI Feynman, Bronstein et al. on geometric deep learning, the various lattice/CaloGAN/equivariance papers). Read the textbooks for the conceptual structure developed in Sections 2–9, and the methods papers for the architectures developed in Sections 11–19.
-
The Feynman Lectures on PhysicsThe most-celebrated introductory physics textbook ever written. Volume I covers mechanics, radiation, and heat; Volume II covers electromagnetism and matter; Volume III covers quantum mechanics. Feynman's clarity and conceptual depth remain unmatched. The full text is freely available online via the Feynman Lectures website. The right starting reference for any AI reader engaging seriously with physics. The reference introductory physics text.
-
Classical MechanicsThe standard graduate-level classical mechanics textbook. Comprehensive coverage of Newtonian, Lagrangian, and Hamiltonian formulations; central forces; rigid-body motion; small oscillations; classical field theory; chaos. Mathematically rigorous and the substrate for understanding both quantum mechanics and modern field theory. The reference graduate classical-mechanics textbook.
-
Classical ElectrodynamicsThe standard graduate-level electromagnetism textbook, in print since 1962. Comprehensive coverage of Maxwell's equations, electromagnetic waves, radiation, special relativity from the EM perspective, and a substantial dose of mathematical machinery. Mathematically demanding but the canonical reference. The reference graduate electromagnetism textbook.
-
Statistical MechanicsA standard graduate-level statistical mechanics textbook. Comprehensive coverage of the foundations, ensemble theory, ideal gases, phase transitions, and critical phenomena. Substantially complemented by Pathria's and Reichl's similar-level texts. The right reading for understanding the partition-function-and-free-energy machinery that recurs throughout machine learning. The reference graduate statistical-mechanics textbook.
-
Spacetime and Geometry: An Introduction to General RelativityA modern, well-written introduction to general relativity, covering special relativity, manifolds, curvature, the Einstein equations, the Schwarzschild and Kerr solutions, cosmology, and gravitational waves. Substantially more accessible than Wald or Misner-Thorne-Wheeler, yet covers essentially the same material. The right starting reference for general relativity. The reference modern general-relativity textbook.
-
Introduction to Quantum MechanicsThe standard advanced-undergraduate quantum mechanics textbook. Comprehensive coverage of the Schrödinger equation, the hydrogen atom, angular momentum, identical particles, time-independent and time-dependent perturbation theory, and scattering. Substantially more accessible than the graduate-level Sakurai or Cohen-Tannoudji while covering essentially the same material. The right starting reference for an AI reader. The reference advanced-undergraduate QM textbook.
-
An Introduction to Quantum Field TheoryThe dominant graduate-level QFT textbook. Comprehensive coverage of QED, QCD, the Standard Model, gauge theories, renormalisation, and the various computational techniques. Substantially demanding but the canonical reference for serious engagement with particle theory. The right reading for an AI reader who wants to understand the gauge-theory machinery underlying the Standard Model. The reference graduate QFT textbook.
-
Particle Data Group: Review of Particle PhysicsThe canonical empirical reference for particle physics. The PDG's Review of Particle Physics is updated every two years with the latest experimental measurements, theoretical predictions, and methodological developments. The accompanying Booklet (a small printed pamphlet) contains the most-used numerical values, and the full Review is freely available online. Essentially every particle-physics paper cites the PDG for parameter values. The reference for empirical particle physics.
-
Solving the quantum many-body problem with artificial neural networksThe foundational paper on neural quantum states. Demonstrates that a Restricted Boltzmann Machine can represent ground states of various spin systems competitively with traditional methods. The substrate of the entire NQS methodology that has matured substantially since 2017. The natural starting reference for the NQS material of Section 6. The reference for neural quantum states.
-
Physics-Informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential EquationsThe foundational PINNs paper. Establishes the methodology of training neural networks to satisfy PDEs as soft constraints in the loss function alongside data-fitting objectives. The substrate of subsequent work on physics-informed methods across virtually every physics-and-engineering domain. The natural reading for the PINN material of Section 7. The reference for physics-informed neural networks.
-
Magnetic control of tokamak plasmas through deep reinforcement learningThe watershed RL-for-fusion paper. Demonstrates that a deep RL agent trained in simulation can autonomously control real-tokamak plasma at TCV (EPFL), achieving target shapes including diverted, snowflake, and droplet configurations. The natural reading for the plasma-control material of Section 5 and a substantial demonstration that RL methods can engage with operational physics control problems. The reference for AI-for-fusion control.
-
AI Feynman: A Physics-Inspired Method for Symbolic RegressionThe major modern symbolic-regression paper, using physics-inspired heuristics (dimensional analysis, separability, symmetries) to discover closed-form equations from data. Demonstrates rediscovery of essentially all the equations in the Feynman Lectures from numerical data alone. The natural reading for the symbolic-regression material of Section 8 and a key reference for AI-driven scientific discovery. The reference for symbolic regression in physics.
-
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and GaugesThe major synthesis of geometric deep learning, covering equivariant networks, gauge-equivariant CNNs, and the unified framework that connects them. Substantial coverage of the mathematical machinery (group theory, representation theory, differential geometry) underlying modern equivariant ML. The natural reading for the foundation-model and equivariant-architecture material of Section 9. The reference for the geometric/equivariant ML framework.
-
Flow-based generative models for Markov chain Monte Carlo in lattice field theoryThe foundational paper on normalising flows for lattice QCD. Demonstrates that a flow-based generator can produce statistically-independent gauge-field configurations for 2D φ⁴ theory, bypassing the autocorrelation problem that limits traditional MCMC. The substrate of subsequent work extending the methodology to U(1), SU(N), and increasingly QCD-relevant theories. The natural reading for the lattice-QCD material of Section 4. The reference for flow-based lattice methods.
-
Deep neural networks for accurate predictions of crystal stabilityA representative early paper on neural-network surrogates for materials physics, demonstrating that deep networks can predict formation energies of crystalline solids with accuracy approaching DFT calculations at orders of magnitude lower compute cost. Cited here as a representative example of the surrogate-modelling methodology that recurs throughout AI for physics. The reference for neural-network surrogates in materials physics.
-
Lorentz Group Equivariant Neural Network for Particle Physics (LorentzNet)The LorentzNet paper. Establishes the methodology of building Lorentz-equivariant graph neural networks for particle-physics applications, demonstrating substantial empirical improvements over non-equivariant baselines on standard jet-tagging benchmarks. The substrate of subsequent equivariant-particle-physics-ML work. The natural reading for the equivariant-architecture material of Section 9 and the particle-physics applications of Section 2. The reference for Lorentz-equivariant networks.
-
Ab Initio Solution of the Many-Electron Schrödinger Equation with Deep Neural Networks (FermiNet)The FermiNet paper. Introduces antisymmetric neural-network ansätze for electronic-structure calculations, achieving chemical-accuracy results on small molecules without grid or basis-set discretisation. A major step in NQS methodology applied to fermionic many-body systems. The natural reading for the NQS material of Section 6 in the electronic-structure context. The reference for fermionic neural quantum states.
-
Anomaly Detection at the LHC: An OverviewA modern community review of anomaly-detection methods for new-physics searches at the LHC. Covers autoencoder-based methods, density-estimation methods, classification-without-labels, and the various other approaches. Cited here as a representative entry point for the BSM-search methodology of Section 2. The reference review of LHC anomaly detection.
-
CaloChallenge: Fast Calorimeter SimulationThe community benchmark for ML-based fast calorimeter simulation, with standardised datasets, evaluation metrics, and method submissions across multiple iterations. The empirical landscape of GAN, VAE, normalising-flow, and diffusion-based methods is substantially organised through the challenge. The natural starting reference for the detector-simulation material of Section 3. The reference benchmark for ML-fast detector simulation.
-
PySR: Fast & Parallelized Symbolic Regression in Python/JuliaThe PySR paper and library. The dominant modern open-source symbolic-regression tool, with substantial empirical refinements on the genetic-programming methodology and integration with neural-network-based hybrid approaches. Widely used across physics-discovery applications. The natural reading for the symbolic-regression material of Section 8 and the practical entry point for the methodology. The reference open-source symbolic-regression tool.