Climate, Earth Systems & AI, from the energy budget to GraphCast and the virtual Earth.
Earth's climate is the largest, most-coupled physical system humans routinely study. The atmosphere, oceans, land surface, cryosphere, and biosphere exchange energy, water, and carbon on time scales from seconds to hundreds of millions of years. Climate science has been the most-active AI-for-Science application area of the past three years: the 2022–2024 wave of AI weather-forecasting models (FourCastNet, Pangu-Weather, GraphCast, Aurora, GenCast) reached a watershed comparable to AlphaFold's, and the methodology has extended to climate emulation, subgrid-scale parameterisation, remote-sensing analysis, extreme-event attribution, and regional downscaling. This chapter develops both the working climate-science vocabulary an AI reader needs (Sections 2–9 — the atmosphere, oceans, carbon cycle, greenhouse effect, climate sensitivity, GCMs, paleoclimate, observational systems) and the AI methodology that has substantially reshaped the field since 2022 (Sections 10–19 — distinctive properties of climate AI, weather forecasting, architectures, ensembles, climate emulators, ML parameterisations, remote sensing, extremes and attribution, downscaling, and the frontier). The single chapter combines what the field treats as inseparable: the climate physics that frames the problems and the AI methods that increasingly drive how forecasts are produced and projections are extracted.
Prerequisites & orientation
This chapter is both a domain primer and an AI-methods chapter. The first half (Sections 1–9) assumes high-school physics — Newton's laws, the ideal gas law, basic thermodynamics — and high-school chemistry. The second half (Sections 10–19) assumes the working machinery of modern deep learning (Part VI on transformers and CNNs), the graph-neural-network material of Part XIII Ch 05 (essential for Section 12), the diffusion-model material of Part X (essential for Section 13's ensemble forecasting), the foundation-model material of Part X (the substrate for several methods throughout), and the Fourier-neural-operator and physics-informed-network material of Ch 01 (Scientific ML). Readers with a climate-science background can skim Sections 2–9; readers with strong ML but no climate background should take their time with the first half before engaging with the second.
Three threads run through the chapter. The first is the energy-budget view: climate is fundamentally about how energy flows from the Sun through the Earth system and back to space, how it gets temporarily stored along the way (oceans, ice, atmospheric water vapour), and how human activity has perturbed the storage-and-flow balance through greenhouse-gas emissions. The second is the scale-and-coupling problem: climate phenomena span scales from millimetre-scale turbulence in cloud droplets through global-scale ocean circulation patterns that take a millennium to overturn, with non-linear coupling across scales. The third is the physics-vs-data tension that shapes the AI methodology: pure data-driven methods can produce excellent in-distribution forecasts but may fail under unprecedented conditions (the very situations climate change creates), while physics-informed methods sacrifice some accuracy for principled extrapolation behaviour. Section 19 returns to this tension; it appears in passing throughout the AI half.
Why Climate, and Why Climate-AI
Earth's climate is the largest, most-coupled physical system humans routinely study. The Earth system spans nine orders of magnitude in length and twelve in time, the data substrate is petabyte-scale and growing daily, and the empirical questions matter — for weather forecasting that protects lives, for climate-change attribution that informs policy, for the basic scientific understanding of the planet we live on. This chapter develops both the working climate-science vocabulary an AI reader needs (Sections 2–9) and the AI methodology that has substantially reshaped the field since 2022 (Sections 10–19). Section 10 frames what makes climate-AI methodologically distinctive from an ML perspective; this section maps the climate itself.
The energy-budget view
The most useful framing of climate for an AI reader is the energy-budget view. Earth receives ~340 W/m² (averaged over the surface) of incoming shortwave solar radiation. About 30% is reflected by clouds, ice, and bright surfaces (the albedo); the other 70% is absorbed by the atmosphere and surface, warming them. The warmed Earth emits longwave (infrared) radiation back to space, and over long enough averages the incoming and outgoing fluxes balance. Greenhouse gases absorb specific bands of the outgoing infrared, re-emit some of it back toward the surface, and keep Earth's surface temperature ~33 K warmer than it would otherwise be. Human emissions of CO₂, methane, and other greenhouse gases have shifted this balance — the radiative forcing from anthropogenic greenhouse gases is currently ~2.7 W/m² compared to pre-industrial — and the system is gradually warming to a new equilibrium. Section 5 develops this in detail; the point here is that essentially every quantitative claim in climate science can be traced back to the energy budget.
The scale-and-coupling problem
Climate phenomena span enormous scale ranges. Cloud microphysics (millimetre-scale droplet formation) influences global radiative balance. Mesoscale convection (tens of kilometres) organises into hurricanes and thunderstorms that move heat poleward. Synoptic-scale weather systems (~1,000 km) are what daily forecasts predict. Planetary-scale circulations (Hadley cells, jet streams, ocean gyres) define climate zones. Thermohaline ocean circulation takes ~1,000 years to overturn the deep ocean. Glacial-interglacial cycles last ~100,000 years. The dynamics couple non-linearly across scales: small-scale cloud processes feed back on global temperature; deep-ocean heat uptake controls the rate of warming response; ice-sheet dynamics happening over centuries determine sea-level on the timescales that matter for human infrastructure. Modern climate models resolve some scales explicitly and parameterise others; AI methods are increasingly central to both the parameterisations (Section 15) and the cross-scale emulation traditional methods cannot do at production speed (Section 14).
What modern climate science looks like
Climate science is the discipline that pushes computational science furthest. Modern global circulation models (Section 7) are among the largest physics codes ever written. The observational network (Section 9) generates multiple terabytes per day of satellite data, atmospheric profiles, and ocean measurements. The 2025 working climate scientist spends most of their time on a computer, alongside model output, observational data, and increasingly AI methods that process both. The interface where climate physics and computation meet is the natural home of the AI methods this chapter develops in its second half.
Why now
Several factors have made climate AI suddenly central. Petabyte-scale reanalysis datasets (especially ERA5) have provided a canonical training substrate that did not exist a decade ago. The 2022–2023 wave of AI weather-forecasting models demonstrated that deep learning could match or exceed fifty years of NWP development at a fraction of the compute cost. Climate emulators have begun to make policy-relevant analysis tractable that traditional GCMs could not produce. Remote-sensing AI processes data volumes that overwhelm traditional retrieval methods. The combination has produced a field that is transitioning from "AI methods being explored" to "AI methods deployed in production at major operational centres" — and the transition has happened in roughly three years rather than the decade it took for protein-AI.
How this chapter is organised
Sections 2–9 develop the working climate-science vocabulary: the atmosphere (Section 2), the oceans (Section 3), the carbon cycle (Section 4), the greenhouse effect (Section 5), climate sensitivity and feedbacks (Section 6), General Circulation Models (Section 7), paleoclimate (Section 8), and the observational network (Section 9). Section 10 turns to the AI methodology proper, framing what makes climate-AI distinctive from a machine-learning perspective. Sections 11–19 develop the methods: AI weather forecasting (11), architectures (12), ensembles and probabilistic AI (13), climate emulators (14), ML parameterisations (15), remote sensing (16), extremes and attribution (17), downscaling (18), and the frontier (19).
Climate science differs from many physical sciences in three ways that shape how AI methods engage with it: the system is genuinely coupled across scales (no clean separation between weather and climate), the empirical record is long and detailed (which is what makes attribution and validation possible), and the policy stakes are high (which raises the bar for AI methods to be trusted in production). The vocabulary developed in Sections 2–9 is the prerequisite for engaging with the AI methods of Sections 10–19 — methodology that has begun to reshape what computational climate science can do.
The Atmosphere: Composition, Structure, and Circulation
The atmosphere is the thin, well-mixed gaseous envelope that surrounds Earth. It carries weather, holds the greenhouse gases that warm the planet, and is where most of what we call "climate" actually happens. Understanding its composition, layered vertical structure, and large-scale circulation is the prerequisite for understanding everything else in this chapter.
Composition
By dry-air volume the atmosphere is 78% nitrogen (N₂), 21% oxygen (O₂), 0.93% argon (Ar), and 0.042% (~420 ppm in 2025) carbon dioxide (CO₂), with smaller amounts of various trace gases. Water vapour (H₂O) is the wildcard — its concentration varies from near-zero in cold dry polar air to ~4% in warm humid tropical air, making it the most variable and most-radiatively-important component. The trace greenhouse gases worth knowing by name: methane (CH₄, ~1.9 ppm, ~30× the per-molecule warming impact of CO₂ over 100 years), nitrous oxide (N₂O, ~0.33 ppm, ~270× CO₂), ozone (O₃, variable, important in the stratosphere for UV shielding), and the various halocarbons (CFCs and successors, ppt-level abundances but per-molecule warming hundreds to thousands of times that of CO₂). Aerosols (suspended particulates — dust, sulfate, sea salt, soot, biomass-burning particles) are not gases but profoundly affect both radiation and cloud formation.
Vertical structure
The atmosphere has a layered vertical structure defined by temperature gradients. The troposphere (0–~12 km, lower in polar regions, higher in tropics) is where weather happens; temperature decreases with altitude at roughly 6.5 K/km, and vertical mixing is vigorous. The boundary at the top, the tropopause, marks where temperature stops decreasing. The stratosphere (~12–50 km) has temperature increasing with altitude because ozone there absorbs UV radiation; this stable thermal structure suppresses vertical mixing and keeps the stratosphere effectively decoupled from troposphere on short timescales. The mesosphere (~50–85 km) cools again with altitude. The thermosphere (~85+ km) heats again as solar UV ionises low-density gas. Most weather, climate, and AI methods operate in the troposphere; the stratosphere matters for ozone chemistry, polar vortex dynamics, and certain long-lived greenhouse gases.
The general circulation
The atmosphere circulates because the tropics receive more solar energy than the poles, and the system tries to redistribute heat. The primary engine is the Hadley cell: warm air rises near the equator, flows poleward at altitude, sinks at ~30° latitude (producing the global belt of subtropical deserts), and returns equatorward at the surface as the trade winds. Two more circulation cells in each hemisphere — the Ferrel cell (mid-latitudes, eddy-driven) and polar cell — complete the meridional pattern. Earth's rotation deflects this flow via the Coriolis force, producing the prevailing wind patterns: easterlies in the tropics, westerlies in mid-latitudes, easterlies near the poles. Concentrated bands of fast winds near the tropopause — the subtropical jet and polar jet — steer mid-latitude weather systems and shape day-to-day weather variability.
Weather vs climate
An important conceptual distinction: weather is the state of the atmosphere at a given time and place; climate is the statistics of weather over decades. Weather is governed by the chaotic non-linear dynamics of the Navier-Stokes equations with thermodynamics and is fundamentally unpredictable beyond ~2 weeks (Lorenz's foundational result on deterministic chaos). Climate is statistical and is predictable on decadal-to-centennial timescales because the slowly-changing boundary conditions (greenhouse gas concentrations, solar forcing, ocean heat content) determine the statistics even when individual weather realisations are unpredictable. The distinction matters substantially for AI methods: weather forecasting (Section 11) is the dominant short-term application; climate projection is a different problem with different methodology.
Key atmospheric phenomena
A working AI reader should recognise a handful of named phenomena: El Niño/La Niña (ENSO; year-to-year tropical Pacific oscillation that drives global weather variability — Section 3 develops it as a coupled ocean-atmosphere phenomenon), the Madden-Julian Oscillation (MJO; intraseasonal tropical convection patterns), the North Atlantic Oscillation (NAO; pressure-difference index controlling European winter weather), the Quasi-Biennial Oscillation (QBO; stratospheric wind reversal with ~28-month period), and the various monsoon systems (Indian, West African, North American). These named patterns are the substrate of much seasonal forecasting and many AI-based climate-prediction methods.
The Oceans: Circulation, Heat, and Biogeochemistry
The oceans cover 71% of Earth's surface, hold ~97% of the planet's water, and have ~1,000 times the heat capacity of the atmosphere. They store more than 90% of the heat humans have added to the climate system, take up roughly a quarter of human CO₂ emissions, and circulate on timescales from days (surface currents) to a millennium (deep thermohaline overturning). Climate is impossible to understand without them.
Basins and basic structure
The world ocean is conventionally divided into the Pacific (largest, ~50% of total area), Atlantic, Indian, Arctic, and Southern oceans. The vertical structure mirrors the atmosphere's: a well-mixed surface layer (typically 50–200 m, deeper in winter, shallower in summer), a transition zone (the thermocline, where temperature drops sharply with depth), and a cold deep layer (~2–4°C, occupying most of the ocean's volume). Salinity adds another dimension — surface salinity varies by region (saltier in subtropics where evaporation exceeds precipitation, fresher near major river outflows and at high latitudes), and density depends jointly on temperature and salinity through a non-linear equation of state.
Surface circulation: the gyres
Wind-driven surface circulation organises into five major subtropical gyres (North and South Atlantic, North and South Pacific, Indian) that rotate clockwise in the Northern Hemisphere and counter-clockwise in the Southern. Each gyre has a fast western boundary current — the Gulf Stream (North Atlantic), Kuroshio (North Pacific), Brazil Current, Agulhas Current, and East Australian Current — that carries warm tropical water poleward at speeds of 1–2 m/s. The slower equatorward eastern boundary currents return cooler water along the western coasts of continents (the California Current, Canary Current, Humboldt/Peru Current). These western boundary currents are major heat-transport conveyors and substantially shape regional climate (the Gulf Stream's heat is much of why Western Europe has milder winters than the equivalent latitude in eastern Canada).
Thermohaline circulation
The deep ocean circulates through density-driven thermohaline circulation, sometimes called the "global conveyor belt." Cold, salty water sinks at high latitudes — primarily in the Labrador and Greenland Seas (forming North Atlantic Deep Water) and around Antarctica (forming Antarctic Bottom Water). The sunken water spreads through the deep ocean basins, slowly returns to the surface through diffuse upwelling primarily in the Pacific and Indian Oceans, and the surface return flow closes the loop. The full circuit takes ~1,000 years. Thermohaline circulation transports substantial heat poleward (complementing wind-driven heat transport), and its potential weakening under climate change — the Atlantic Meridional Overturning Circulation (AMOC) is the most-studied component — is one of the major climate-tipping-point concerns. AI methods for predicting AMOC behaviour from observational data are an active research area.
ENSO: the canonical coupled phenomenon
The El Niño Southern Oscillation (ENSO) is the single most-important year-to-year climate variability mode and the canonical example of coupled ocean-atmosphere dynamics. In the normal state ("La Niña"), trade winds blow east-to-west across the tropical Pacific, piling warm water in the western Pacific (around Indonesia) and allowing cool subsurface water to upwell off the coast of Peru. Every 3–7 years the trade winds weaken, the warm pool sloshes back eastward, and the equatorial Pacific develops a basin-wide warm anomaly — that's El Niño. The pattern has profound global impacts on rainfall, temperature, and tropical cyclone activity; it is the reason 2023's record global temperatures came when they did. ENSO is well-monitored, well-modelled, and the substrate for substantial AI-based seasonal-forecasting work (Ham et al. 2019 demonstrated that CNN-based methods could predict ENSO at lead times traditional methods cannot — an early AI-for-climate watershed).
The ocean's role in the carbon cycle
The oceans take up roughly a quarter of human CO₂ emissions through two mechanisms. The solubility pump: cold water dissolves CO₂ better than warm water, so high-latitude regions (where deep water forms) absorb CO₂ from the atmosphere, and the dissolved CO₂ travels with the deep water. The biological pump: phytoplankton at the surface fix CO₂ through photosynthesis, some of the resulting organic carbon sinks before being respired, and the deep-sequestered carbon stays out of contact with the atmosphere for centuries. Both pumps have substantial spatial structure that AI methods help characterise from satellite ocean-colour observations and Argo profiling data. The ocean's CO₂ uptake also produces ocean acidification — dissolved CO₂ forms carbonic acid, lowering surface pH by ~0.1 since pre-industrial times — which is its own substantial environmental concern.
Marine heatwaves and the modern frontier
The 2020s have seen unprecedented marine heatwaves — extended periods of anomalously warm sea-surface temperature — that have substantial ecological and climate consequences. The 2023–2024 North Atlantic anomaly was particularly extreme, with sea-surface temperatures persistently 2–3°C above climatology for months. AI methods for detecting, attributing, and predicting marine heatwaves are an active 2024–2026 research area, with implications for fisheries management, coral-reef monitoring, and short-term climate prediction.
The Carbon Cycle
Carbon moves between Earth's atmosphere, oceans, land surface, and rocks on timescales from days to hundreds of millions of years. The fast cycle (atmosphere–biosphere–ocean) controls year-to-year CO₂ variability and the response to fossil-fuel emissions. The slow cycle (geological weathering and burial) controls atmospheric CO₂ on million-year timescales. Understanding both is essential for understanding the climate system's response to perturbations.
The reservoirs
Carbon is unevenly distributed across reservoirs. Sedimentary rocks (carbonates, organic matter buried in sediments) hold ~75 million petagrams of carbon — by far the largest reservoir, but mostly out of contact with the surface system on human timescales. Deep ocean dissolved inorganic carbon: ~37,000 PgC. Surface ocean: ~1,000 PgC. Soils and detritus: ~2,400 PgC. Atmosphere: ~870 PgC (in 2025; ~590 PgC pre-industrial). Living biomass: ~550 PgC. Fossil-fuel reserves: ~1,000 PgC of recoverable carbon, of which ~470 PgC has been burned and added to the atmosphere-ocean-biosphere system since 1750. The atmospheric reservoir is small, which is why even the relatively small annual flux of human emissions (~10 PgC/year) shifts atmospheric CO₂ measurably from year to year.
The fast cycle
The fast cycle exchanges carbon among atmosphere, ocean surface, and the terrestrial biosphere on timescales of days to centuries. Photosynthesis by land plants and marine phytoplankton fixes ~120 PgC/year from atmosphere into biomass; respiration and decomposition return roughly the same amount, with seasonal asymmetry producing the famous Mauna Loa CO₂ "saw-tooth" pattern (atmospheric CO₂ drops in Northern Hemisphere summer when forests are growing, rises in winter when respiration dominates). The atmosphere-ocean exchange is similarly large (~80 PgC/year in each direction). Net annual fluxes are much smaller than the gross fluxes; small imbalances in the gross exchange — like the human contribution — accumulate measurably over time.
Anthropogenic perturbation
Humans currently emit ~10 PgC/year from fossil-fuel combustion and cement production plus ~1 PgC/year from land-use change (deforestation net of regrowth). Of this combined ~11 PgC/year, roughly half stays in the atmosphere (raising atmospheric CO₂ by ~2.5 ppm/year as of 2025), about a quarter is taken up by the oceans (mainly through the solubility pump), and about a quarter is taken up by the land biosphere (CO₂ fertilisation, regrowth in regions that have stopped clearing forest, expanding boreal vegetation as the climate warms). The fraction staying in the atmosphere — the airborne fraction, currently ~45% — is a critical climate-prediction parameter, and its likely future evolution is one of the larger uncertainties in long-term climate projections.
Carbon-cycle feedbacks
The carbon cycle is not a passive recipient of human emissions; it responds dynamically to climate change in ways that can amplify or dampen warming. Ocean uptake tends to weaken as the surface ocean warms (warm water holds less CO₂) and as the carbon-carrying capacity of surface water saturates. Permafrost thaw can release substantial methane and CO₂ from previously-frozen organic matter — estimates of vulnerable carbon range from 600 to 1,400 PgC, comparable to the entire current atmospheric reservoir. Tropical forest dieback under sustained drought could shift the Amazon from a carbon sink to a source. Vegetation expansion in newly-warm regions could partially offset these losses. The net sign and magnitude of carbon-climate feedback remains a substantial uncertainty in climate projections, and AI methods for analysing flux-tower data, satellite vegetation observations, and isotopic constraints on carbon-cycle behaviour are an active research area.
The slow cycle
On geological timescales, atmospheric CO₂ is regulated by the balance between silicate weathering (chemical breakdown of silicate rocks consumes CO₂ and produces dissolved bicarbonate, which eventually deposits as marine carbonates) and volcanic outgassing (CO₂ released from Earth's interior). The weathering response to temperature — warmer means faster weathering means more CO₂ drawdown — provides a million-year-timescale negative feedback that has kept Earth's climate within habitable bounds for billions of years (the Walker feedback). The slow cycle is irrelevant on human timescales — silicate weathering adjusts to perturbations over ~100,000 years, far slower than the fossil-fuel transient — but it sets the deep-time backdrop against which paleoclimate (Section 8) is interpreted.
The Greenhouse Effect and Radiative Forcing
The greenhouse effect is the central piece of physics in climate science. Once it's understood, essentially every quantitative claim about anthropogenic climate change follows. The mechanism is not complicated; the consequences are.
The basic mechanism
Earth receives solar radiation in the visible and near-infrared (peak around 0.5 µm wavelength). The planet's effective temperature, set by the requirement that incoming and outgoing energy balance, would be ~255 K (−18°C) if Earth were a bare blackbody radiating to space. Actual surface temperature is ~288 K (15°C). The 33 K difference is the greenhouse effect, and its cause is that Earth radiates outward in the longwave infrared (peak ~10 µm), where certain gases — CO₂, water vapour, methane, ozone, the various halocarbons — absorb specific bands strongly. The absorbed energy is re-emitted in all directions, including back toward the surface, raising the surface temperature above what a transparent atmosphere would allow. The simplest quantitative model treats the atmosphere as one or more "windows-and-shutters" layers that selectively block specific infrared wavelengths.
Why the specific gases matter
Different gases absorb at different wavelengths. Water vapour is the dominant greenhouse gas in absolute terms — it accounts for ~50% of the natural greenhouse effect — but its concentration is set by temperature (the Clausius-Clapeyron relation gives ~7%/K increase in saturation vapour pressure), so it acts as an amplifying feedback rather than a primary forcing. CO₂ absorbs in the 13–17 µm band, near the peak of Earth's outgoing infrared, and is well-mixed (its lifetime in the atmosphere is centuries to millennia, long enough to homogenise across the globe). Methane absorbs in the 7–8 µm band and has a per-molecule warming impact ~30× that of CO₂ over a 100-year horizon (~80× over 20 years), but its atmospheric lifetime is only ~12 years. The various halocarbons absorb in narrow bands within the atmospheric "window" (8–13 µm) where there are otherwise no strong absorbers, giving them per-molecule warming impacts thousands of times that of CO₂.
Radiative forcing
The standard quantitative measure of a perturbation to the climate system is radiative forcing: the change in net top-of-atmosphere radiative flux (in W/m²) attributable to the perturbation, holding everything else fixed. CO₂ forcing follows a logarithmic dependence on concentration: each doubling adds ~3.7 W/m² of forcing, regardless of starting point. Pre-industrial CO₂ was ~280 ppm; current ~420 ppm; the resulting CO₂-only forcing is ~2.1 W/m². Adding the other greenhouse gases brings total well-mixed-greenhouse-gas forcing to ~3.5 W/m². Aerosols (sulfate, organic carbon, dust) provide a partially-offsetting cooling effect of ~−1 W/m² with substantial uncertainty (the largest single uncertainty in the modern radiative-forcing budget). Net anthropogenic forcing as of 2024 is ~2.7 ± 0.7 W/m², for context against the natural ~340 W/m² incoming solar radiation.
The Stefan-Boltzmann constraint and equilibrium response
If radiative forcing increases by ΔF, the surface temperature must rise by an amount ΔT that brings the outgoing radiation back into balance. To zero-th order, the Stefan-Boltzmann law (outgoing radiation ∝ T⁴) gives a no-feedback temperature response of ΔT ≈ ΔF / (4σT³) ≈ 1.2 K per doubling of CO₂. The actual response — the equilibrium climate sensitivity (ECS) — is larger because feedbacks (Section 6) amplify the bare response. The current best estimate is ECS ≈ 3.0 K with a range of ~2.5–4 K, and the value matters substantially for long-term policy because it determines how much warming each ton of cumulative CO₂ emissions produces.
The greenhouse-effect history
The basic physics of the greenhouse effect was worked out by Joseph Fourier (1827, who first identified the mechanism), John Tyndall (1859, who measured the infrared absorption of CO₂ and water vapour), Svante Arrhenius (1896, who quantitatively predicted that doubling CO₂ would warm Earth by ~5°C — close to the modern range), and Guy Callendar (1938, who first connected fossil-fuel emissions to observed warming). The science was substantially settled by the 1970s; the political response has lagged the science by 30+ years and remains contested in some jurisdictions despite the underlying physics being unambiguous.
Climate Sensitivity and Feedbacks
Climate sensitivity — how much surface temperature rises per unit of radiative forcing — is the single most-important quantity in climate science from a policy perspective. Its value is set by the network of feedbacks that amplify or dampen the bare radiative response.
The major positive feedbacks
Water-vapour feedback is the largest amplifier. Warmer atmosphere holds more water vapour (Clausius-Clapeyron, ~7%/K), water vapour is a strong greenhouse gas, so the additional water vapour adds further warming. The water-vapour feedback roughly doubles the no-feedback temperature response. Ice-albedo feedback is the second major positive: warming melts snow and sea ice, exposing darker surfaces underneath (ocean, land), which absorb more solar radiation, which causes further warming. The effect is concentrated at high latitudes (which is why the Arctic is warming ~3× faster than the global average — Arctic amplification) and contributes ~0.3–0.4 K to total ECS. Cloud feedbacks are more complex — clouds both reflect incoming sunlight (cooling) and trap outgoing infrared (warming) — but the consensus view from CMIP6 models is that the net cloud feedback is positive, contributing ~0.3–0.7 K to ECS, with substantial inter-model spread.
The major negative feedbacks
The dominant negative feedback is the Planck response itself: hotter Earth radiates more efficiently (the T⁴ in Stefan-Boltzmann), pulling temperatures back toward equilibrium. This is the feedback that gives the no-feedback ~1.2 K response per CO₂ doubling. Lapse-rate feedback partly offsets water-vapour feedback: in moist tropical regions, warming concentrates aloft (where the lapse rate flattens), making the upper troposphere warm faster than the surface and emit more efficiently to space. The net effect of lapse-rate plus water-vapour is positive but smaller than water-vapour alone.
Equilibrium vs transient sensitivity
Two distinct sensitivities matter for policy. Equilibrium climate sensitivity (ECS) is the steady-state warming after CO₂ is doubled and the system has fully equilibrated — including slow ocean heat uptake taking centuries. Transient climate response (TCR) is the warming at the time CO₂ has doubled in a scenario where CO₂ rises at 1%/year, capturing the system's faster response. ECS is typically 1.5–2× TCR because deep ocean heat uptake delays the equilibrium response. The IPCC AR6 (2021) likely range is ECS = 2.5–4 K (best estimate 3.0 K) and TCR = 1.4–2.2 K (best estimate 1.8 K). Ranges have narrowed substantially since AR5 (2013), partly due to better paleoclimate constraints (Section 8) and partly due to improved emergent-constraint methodology.
Why ECS uncertainty matters
The 2.5–4 K likely range for ECS translates directly into very different policy implications. At ECS = 2.5 K, achieving the Paris Agreement's 2°C target is substantially easier than at ECS = 4 K — roughly 60% more cumulative emissions could be tolerated. The narrowing of the ECS range over the past decade is one of the genuine successes of 21st-century climate science. AI methods for sharpening climate-sensitivity estimates from observational data, paleoclimate reconstructions, and model emergent constraints are an active research area, with implications that propagate directly to the carbon-budget calculations underpinning international climate agreements.
The tipping-point question
Beyond the gradual feedbacks above, climate science is increasingly concerned with potential tipping points: thresholds beyond which large-scale, possibly-irreversible changes are triggered. Candidate tipping elements include: collapse of the Atlantic Meridional Overturning Circulation (Section 3), Greenland and West Antarctic ice sheet disintegration, dieback of the Amazon rainforest, permafrost methane release, and Arctic sea-ice loss (already substantially underway). Identifying tipping-point thresholds and early-warning signals from observational data is a major frontier; AI methods for detecting threshold crossings in noisy climate time series are an active 2024–2026 research direction.
General Circulation Models (GCMs)
General Circulation Models are the computational backbone of climate science. They integrate the equations of fluid dynamics, thermodynamics, and radiative transfer over a discretised global grid, simulating climate response to natural and anthropogenic forcings on timescales from days (weather forecasting) through centuries (climate projection). Modern GCMs are among the largest physics codes ever written, and they are the substrate against which AI methods are now being benchmarked, accelerated, and increasingly replaced for specific tasks.
The primitive equations
At the core of every GCM are the primitive equations: the Navier-Stokes equations on a rotating sphere, expressing conservation of momentum, mass (continuity), and energy (thermodynamic equation), plus the ideal gas law as an equation of state. They describe how a fluid (the atmosphere or ocean) evolves under pressure gradients, gravity, the Coriolis force, friction, and heating. The equations are non-linear and chaotic; analytical solutions exist only for special cases. Numerical integration on a grid is the only general approach.
Discretisation and the grid
Modern atmospheric GCMs typically run on horizontal grids of 25–100 km resolution with 40–80 vertical levels. Spectral methods (representing fields as series expansions in spherical harmonics) were dominant for decades and remain in use at ECMWF and elsewhere; finite-volume methods on cubed-sphere or icosahedral grids have become more common at NOAA, DOE, and several other centres. Time integration uses semi-implicit methods to handle the fastest waves (gravity waves) without unfeasibly small timesteps. A typical climate-projection run integrates 5,000+ years of model time and consumes tens of millions of CPU-hours on the largest supercomputers.
Parameterisations
Most of the physics that matters for climate happens at scales smaller than the grid can resolve: cloud formation, convection, turbulence, atmospheric chemistry, radiation, surface fluxes, sea ice, vegetation. These are handled by parameterisations: simplified models that take grid-cell-mean state variables as input and return tendencies for the resolved quantities. Cloud parameterisations are particularly difficult and are the dominant source of inter-model spread in climate sensitivity. The "tuning" of parameterisations to match observations and to keep the radiative balance roughly correct is part of model development; this tuning is also a substantial source of methodological controversy. Modern AI methods are increasingly used to improve parameterisations — neural-network-based subgrid closures (Rasp et al. 2018, the various 2024–2026 successors) trained on high-resolution simulations or observations are an active frontier.
Coupling: the Earth System Model
An atmospheric GCM alone is incomplete; modern climate science uses coupled Earth System Models (ESMs) that integrate atmospheric, oceanic, sea-ice, land-surface, vegetation, and biogeochemical components. The components exchange energy, water, and tracers (carbon, dust, biogeochemical species) through coupled flux calculations. The ocean component is itself a 3D GCM running the primitive equations for seawater. The land component handles soil moisture, vegetation, surface energy budgets, and increasingly carbon stocks. The sea-ice component handles ice formation, melt, and motion. Coupling adds substantial complexity — the components have different time steps, different grids, and different stability constraints — and is a major engineering challenge.
The CMIP ensemble
The Coupled Model Intercomparison Project (CMIP) coordinates the major modelling centres' climate runs into standardised archives. CMIP6 (2018–2024) ran ~50 models from ~30 centres through a common protocol of historical, future-scenario, and idealised experiments. CMIP outputs are the basis of nearly every IPCC-cited climate projection, and the ensembles provide the empirical handle on inter-model uncertainty that single-model results cannot. CMIP7 (~2025–2029) is in progress. AI-based emulators trained on CMIP ensembles have become a major application area, producing fast surrogates that reproduce CMIP-style outputs at orders-of-magnitude lower compute cost than running the full GCMs.
What GCMs can and cannot do
GCMs do well at reproducing the large-scale circulation, the seasonal cycle, the response to large-scale forcings (volcanic eruptions, CO₂ doubling), and the broad geographic patterns of climate change. They do less well at regional-scale precipitation patterns, extreme weather statistics, and any phenomena that depend critically on small-scale processes (tropical convection, mesoscale ocean eddies, sea-ice marginal ice zone dynamics). The next generation of storm-resolving or k-scale models — running at ~1 km horizontal resolution, requiring exascale computing — promises to fix some of these limitations but is still in early operational deployment as of 2026. AI methods that downscale, bias-correct, or emulate GCMs are increasingly central to the practical use of GCM output.
Paleoclimate: Proxies and Past Climates
Earth's past is the only natural laboratory for testing how the climate system responds to large perturbations. Paleoclimate reconstructions extend the climate record from the ~150 years of direct instrumental data back through millennia, millions, and billions of years using indirect proxies preserved in ice, sediments, rocks, and biological archives. The methodology is essential for constraining climate sensitivity, validating climate models, and contextualising current change.
The proxy methodology
Direct measurements of past climate are not available; we rely on proxies — physical, chemical, or biological signals preserved in geological materials whose properties depend on past climate conditions. Stable-isotope ratios are among the most-important: the ratio of ¹⁸O to ¹⁶O in foraminifera shells, ice cores, and speleothems depends on the temperature at which the calcite or ice formed (warmer water preferentially contains heavier oxygen, complicating the relationship); the ratio of ¹³C to ¹²C tracks carbon-cycle dynamics. Tree rings provide annually-resolved records of temperature, moisture, and (through their isotopic chemistry) atmospheric composition for the past several millennia. Ice cores preserve trapped air bubbles that record actual ancient atmospheric CO₂ and methane concentrations; the longest cores (Antarctic Vostok and EPICA Dome C) extend back ~800,000 years. Marine sediment cores extend much further — over 100 million years for some sites — through proxies on the planktonic and benthic foraminifera that fall to the ocean floor. Speleothems (cave formations) provide precisely-dated continental records. The methodology of converting proxy measurements into climate quantities is itself a substantial discipline with its own statistical machinery.
The last million years: glacial-interglacial cycles
The dominant feature of late-Quaternary climate (the past ~2.6 million years) is the glacial-interglacial cycles: alternations between cold "glacial" periods with extensive Northern Hemisphere ice sheets and warm "interglacial" periods like the present (the Holocene, the past ~11,700 years). The cycles are paced by Milankovitch forcing — variations in Earth's orbital eccentricity (~100,000-year period), axial tilt (~41,000 years), and precession of the equinoxes (~23,000 years) — that change the seasonal and latitudinal distribution of incoming solar radiation. The cycles' approximately 100,000-year period is dominated by eccentricity, but the eccentricity forcing itself is small; the actual climate response involves substantial amplification by ice-albedo and CO₂ feedbacks. Atmospheric CO₂ varied between ~180 ppm (glacial) and ~280 ppm (interglacial) in lockstep with temperature, providing one of the cleanest paleoclimate constraints on CO₂-temperature coupling.
Deep-time climates
Earth's pre-Quaternary climate history includes several intervals with substantially-different climate states. The Paleocene-Eocene Thermal Maximum (PETM, ~56 million years ago) was a rapid warming event of ~5°C in ~20,000 years, driven by a massive carbon release of likely volcanic or methane-clathrate origin — the closest deep-time analogue for current anthropogenic warming, although the rate of current change is much faster. The Miocene (~23–5 Ma) was substantially warmer than today with CO₂ similar to current values, providing constraints on long-term equilibrium response. The Cretaceous (~145–66 Ma) was much warmer with no permanent polar ice, demonstrating the wide range of climate states the Earth system can support. The Snowball Earth events of the Neoproterozoic (~720–635 Ma) saw the planet nearly entirely glaciated, illustrating the strength of ice-albedo feedback when triggered. Each interval provides a different test case for climate models and a different constraint on long-term sensitivity.
The Holocene and the recent record
The current interglacial — the Holocene, roughly the past 11,700 years — has been climatically stable by Quaternary standards, with global mean temperature varying by perhaps ~0.5 K. The remarkable stability is what allowed agriculture to develop and human civilisation to emerge, and it makes recent anthropogenic warming (already ~1.2 K above pre-industrial as of 2024) unprecedented in the entire span of human civilisation. The "hockey stick" reconstructions of Northern Hemisphere temperature over the past 1,000–2,000 years (Mann et al. 1998 and the various subsequent updates) are the single most-controversial piece of paleoclimate methodology politically, but the underlying empirical pattern — millennial-scale stability followed by rapid 20th-century warming — has been independently confirmed by many subsequent studies.
What paleoclimate buys for AI
Paleoclimate provides three things AI methods need: training data for emergent-constraint methodology (using past climate states to constrain future projections), independent benchmarks for climate-model evaluation (do GCMs reproduce paleoclimate states given the right boundary conditions?), and process understanding for parameterisation development (what does the climate system do when pushed substantially out of the modern observational range?). AI methods for paleoclimate-data integration, proxy-system modelling, and model-data comparison are an active research area, with substantial overlap with the ML-for-time-series methodology of Part XIII.
Observational Systems and Reanalysis
Modern climate science rests on a global observation system that has matured over a century and a half. Understanding what is measured, where, and how often is essential for evaluating both traditional climate methods and AI methods that consume the resulting data.
The instrumental record
Direct atmospheric measurements began in earnest in the late 19th century with surface weather stations, and the modern surface temperature record extends from about 1850 with progressively-improving global coverage. Land surface stations (~10,000 globally as of 2025) measure temperature, pressure, humidity, wind, and precipitation, with most sites providing multiple observations per day. Ocean surface measurements historically came from ship-based observations (which are heterogeneous and biased toward shipping lanes) and increasingly from moored buoys, drifting buoys, and Argo floats (described below). Radiosondes launched twice daily from ~800 stations worldwide profile the atmosphere through ~30 km of altitude, providing the in-situ vertical structure data that's been continuously available since the 1940s.
Satellite observations
The satellite era began with TIROS-1 in 1960 and matured through the 1970s. The modern fleet falls into two categories. Geostationary satellites (GOES-East, GOES-West, Meteosat, Himawari, FY-4) sit at ~36,000 km above the equator, viewing roughly a hemisphere continuously, and produce multispectral imagery every ~10 minutes — the substrate of weather forecasting and severe-weather monitoring. Polar-orbiting satellites (NOAA's JPSS series, EUMETSAT's MetOp, NASA's Aqua/Terra/Aura, the various CubeSats) sweep past every point on Earth ~twice daily, with denser coverage at high latitudes; they carry sounders that profile temperature and humidity, microwave radiometers for cloud and precipitation, scatterometers for ocean winds, altimeters for sea-surface height, and (more recently) lidar for cloud and aerosol structure. The combined satellite data stream is multiple terabytes per day; ML methods for processing it (cloud detection, precipitation retrieval, atmospheric state retrieval) are increasingly central.
Argo and the ocean
The Argo array of ~4,000 autonomous profiling floats has been the major ocean-observation breakthrough of the past two decades. Each float drifts at ~1,000 m depth, descends to 2,000 m every ~10 days, then ascends to the surface measuring temperature and salinity profiles, transmits data via satellite, and repeats. The full array provides ~12,000 profiles per month, distributed roughly uniformly across the world ocean, and has produced the first systematic measurements of ocean heat content and mid-depth circulation. Extensions (Deep Argo to 6,000 m, biogeochemical Argo with oxygen and other sensors) are gradually expanding coverage. Argo data is the primary substrate for monitoring ocean heat uptake — the largest single component of the Earth's energy imbalance — and is essential for closing the climate-system energy budget.
Reanalysis: the synthetic data product
Reanalysis products combine all available historical observations with a frozen modern weather-prediction model to produce a gridded, dynamically-consistent reconstruction of past atmospheric state. The most-used products are ERA5 (ECMWF, 1940–present, ~30 km resolution, hourly, ~5 PB), MERRA-2 (NASA, 1980–present, ~50 km), and JRA-55/JRA-3Q (JMA, 1958–present). Reanalysis is not the same as observation — the model fills in where observations are sparse — but it is internally consistent, evenly gridded, and freely available, which makes it the default training and evaluation substrate for AI weather and climate methods. ERA5 in particular is the substrate of essentially every modern AI weather forecasting paper since GraphCast (Lam et al. 2023). Its limitations matter: pre-satellite data (before ~1979) is much lower quality, biases inherited from the underlying model can propagate, and any AI method trained purely on reanalysis inherits those biases.
The data deluge and the AI opportunity
Climate observations are the petabyte-scale data stream that AI methods are uniquely positioned to engage with. The modern flagship satellites produce multi-spectral, multi-channel data at rates traditional analysis pipelines struggle to keep up with. AI methods for cloud detection, precipitation estimation, atmospheric profile retrieval, and trace-gas concentration measurement are increasingly deployed in operational processing chains. Climate reanalysis production itself is a substantial AI application area — neural-network-based assimilation methods promise to substantially improve on the variational data-assimilation methods that have dominated for thirty years. Sections 10–19 develop the AI-for-climate methodology in detail; the substrate is the observational network this section has surveyed.
From Climate to ML: An Orientation
The previous nine sections established the climate science. This one is the bridge to the methodology that follows. Several properties of the climate-AI subfield make it methodologically distinctive: the operational benchmarking culture inherited from fifty years of NWP development, the canonical training-data substrate (ERA5), the physics-vs-data tension that pure deep-learning approaches don't always handle well, and direct operational deployment stakes that distinguish climate AI from many AI-for-Science domains. This section orients the ML practitioner; Sections 11–19 develop the methods within that frame.
The benchmarking culture
Climate AI is benchmarked against operational numerical weather prediction (NWP) — the most-mature, most-rigorously-evaluated forecasting system humans have ever built. The European Centre for Medium-Range Weather Forecasts' IFS (Integrated Forecasting System) and NOAA's GFS (Global Forecast System) run twice daily on dedicated supercomputers, produce forecasts evaluated against reanalysis ground-truth, and have been continuously improving for fifty years. The community has shared evaluation infrastructure (WeatherBench, originally Rasp et al. 2020, with WeatherBench 2 being the modern reference), standardised verification metrics (RMSE on Z500, T850, U850, V850, MSLP, Q700 at multiple lead times), and a public scorecard culture that makes AI-method-vs-NWP comparisons unambiguous. Few other AI-for-Science domains have this benchmarking discipline; it shapes the methodology substantially because every method gets compared against the same operational baseline using the same metrics on the same lead times.
ERA5 as the canonical substrate
Most modern AI weather and climate methods train on ERA5 reanalysis (Hersbach et al. 2020) rather than on raw observations. ERA5 is a gridded ~30 km hourly reconstruction of atmospheric state from 1940 to present, produced by ECMWF using their operational data-assimilation system to combine all available historical observations with a frozen modern weather model. The substrate has practical consequences: AI methods inherit ERA5's biases (the underlying model has its own systematic errors, particularly for tropical convection and polar regions); training cannot easily extend pre-1940 (no satellite data, sparse surface observations); and adding new observation types requires either re-running ERA5 or developing separate assimilation pipelines. That said, ERA5 is internally consistent, evenly gridded, freely available, and large enough (~5 PB) to support modern foundation-model-scale training. The combination has made it the default substrate for AI weather and climate methodology, and the limitations matter as design constraints throughout the chapter.
The physics-vs-data tension
A specific methodological tension shapes the field. Pure data-driven methods can produce excellent in-distribution forecasts — GraphCast on ERA5 reproduces the climatology, the major modes of variability, and the day-to-day weather statistics — but may fail under conditions outside the training distribution. The very situations climate change creates (unprecedented heatwaves, novel weather regimes, sea-surface temperatures beyond the recorded range) are precisely the cases where pure data-driven extrapolation is least trustworthy. Physics-informed methods (incorporating conservation laws, the radiative-transfer equations, or the primitive equations as constraints) sacrifice some in-distribution accuracy for principled extrapolation behaviour, but the empirical performance gap is real. The field has gradually moved toward hybrid approaches — neural networks within physics-based frameworks (Section 15's ML parameterisations) or physics-based components within neural-network architectures (NeuralGCM, Section 14) — but the tension is unresolved and is one of the major methodological discussion topics as of 2026.
The operational stakes
Unlike most AI domains, climate AI has direct operational stakes. Weather forecasts protect lives, inform aviation routing, drive agricultural decisions, and shape emergency-management responses. Climate projections inform multi-decadal infrastructure investment, insurance pricing, and international policy negotiations. Both contexts have substantial regulatory, validation, and reproducibility requirements that academic ML methodology often does not. Production deployment at the major weather services (ECMWF, NOAA, Met Office, JMA) has its own technical and procedural requirements: numerical stability across thousands of forecast cycles, calibrated uncertainty quantification, fail-safe behaviour under hardware failure, and substantial documentation. Most successful AI-for-climate methods have either been adopted by major operational centres (GraphCast at ECMWF, Pangu-Weather variants at multiple centres) or are en route to that adoption; the methodology is unusually deployment-shaped from the start.
The data-substrate richness
Climate has more data than most AI domains can absorb. ERA5 alone is ~5 PB. The active satellite fleet generates multiple terabytes per day. The Argo array produces ~12,000 ocean profiles per month. The CMIP ensembles aggregate dozens of models' outputs at ~50–100 km resolution across centuries of integration. The challenge is rarely "do we have enough data" — it's "can we usefully process it given finite compute and meaningful evaluation." The 2024–2026 wave of climate-foundation-model efforts (Aurora from Microsoft, Earth-2 from NVIDIA, the various others) is partly about scaling models large enough to extract more from the available data; whether the scaling laws that have driven progress in language modelling apply to climate data is an open empirical question that the chapter returns to.
The evaluation realities
Climate evaluation has its own subtleties that distinguish it from typical ML benchmarking. Spatial spectra matter: a forecast that's accurate in RMSE but produces overly-smooth outputs (the typical failure mode of MSE-trained AI weather models) is operationally less useful than one that preserves the energy at small scales, even if its point-wise accuracy is slightly worse. Probabilistic calibration matters: ensemble forecasts must produce realistic uncertainty estimates, not just point predictions. Extreme-event skill matters: the operationally-important forecasts are precisely the rare cases that rarely-seen training-data behavior poorly predicts. Long-time-stability matters for climate emulation: a model that drifts after 100 years of integration is useless for climate projection. Each of these requires evaluation methodology beyond the standard ML benchmarking practice, and Section 13 returns to ensemble-and-probabilistic evaluation, Section 17 to extreme-event skill, and Section 14 to long-stability.
Operational benchmarking against fifty years of NWP development, training on a single canonical substrate (ERA5) with all the biases that imports, a real physics-vs-data tension that pure deep-learning approaches don't always handle well, and direct operational deployment stakes. The methodology in this chapter is shaped by these constraints; the headline architectures (transformers, graph nets, diffusion models) are familiar from other chapters, but the surrounding evaluation and engineering practice differs substantially.
AI Weather Forecasting: GraphCast, Pangu, and Successors
The 2022–2024 wave of AI weather-forecasting models is the watershed AI-for-climate result, comparable to AlphaFold's role in structural biology. This section tells the story in detail, both because most readers have heard of these systems and because understanding what they did, how, and why is the best on-ramp to the rest of the chapter.
The pre-AI baseline
Operational numerical weather prediction has been continuously improving for ~50 years. The fundamental approach: solve the primitive equations (Section 7) on a global grid using current observations as initial conditions, integrate forward using semi-implicit numerical methods, run multiple ensemble members with perturbed initial conditions, and post-process the output for forecasts. ECMWF's IFS has been the gold-standard global model for decades, with deterministic forecasts at ~9 km resolution and ensemble forecasts at ~18 km running on a Cray supercomputer. NOAA's GFS is the comparable American system. Both have steadily improved by ~1 day of useful forecast skill per decade — a hard-won, expensive trajectory. The standard deterministic-skill metric is the 500-hPa geopotential height anomaly correlation coefficient (Z500 ACC); IFS reaches ACC = 0.6 (the operational threshold for "useful") at about 9 days of lead time, with the curve gradually flattening beyond that.
The AI watershed: 2022–2023
The watershed came in three papers across 2022–2023. FourCastNet (Pathak et al. 2022, NVIDIA) was the first widely-noticed AI weather model: a Fourier neural operator (FNO, Li et al. 2020) trained on ERA5, producing 6-hour to 14-day forecasts at 0.25° (~25 km) resolution. The paper demonstrated that the methodology was at least competitive with operational NWP at certain lead times and variables, while running ~10⁵× faster on a single GPU. Pangu-Weather (Bi et al. 2023, Huawei Cloud, Nature) was the first AI model to clearly exceed IFS on standard verification scores: a 3D Earth-Specific Transformer (a vision transformer adapted for the spherical-and-stratified atmosphere) trained on ERA5 1979–2017 produced deterministic forecasts that beat IFS on most variables and lead times, evaluated rigorously against the ECMWF operational scorecard. GraphCast (Lam et al. 2023, Google DeepMind, Science) used a graph neural network on a multi-resolution icosahedral mesh (six refinement levels, encoder-processor-decoder architecture) trained on ERA5 1979–2017; it produced deterministic 6-hour forecasts at 0.25° resolution, autoregressive out to 10 days, and exceeded IFS on 90% of the ECMWF scorecard variables. The paper documented a level of operational rigour that established the credibility of AI weather forecasting beyond reasonable dispute.
The architectural moves
Three architectural choices recurred across the successful methods. First, spherical-aware backbones: standard CNNs and ViTs assume a flat grid; the atmosphere lives on a sphere and has explicit polar singularities, which methods handle either by switching to graph-based architectures (GraphCast) or by using Earth-aware positional encoding (Pangu's "Earth-Specific Transformer" with separate latitude bands). Second, multi-resolution processing: GraphCast's six-level icosahedral mesh and Pangu's hierarchical 3D windows both reflect the empirical reality that atmospheric phenomena span scales from ~10 km (mesoscale) to ~10,000 km (planetary), and the architectures are designed to exchange information across these scales. Third, recursive autoregressive rollout: training on 6-hour predictions and applying the model recursively for longer leads, with curriculum learning that gradually exposes the model to longer rollouts during training. The recursive rollout is where pure-MSE training tends to produce overly-smooth long-lead forecasts (the model averages over uncertainty rather than producing physically realistic variability) — a problem Section 13's ensemble methods address.
The operational deployment
Deployment of AI weather models in operational settings has progressed quickly. ECMWF's AIFS (their AI Forecast System) launched in early 2024 — a GraphCast-style graph neural network trained internally and run alongside the traditional IFS on the operational schedule, with the AI version typically delivering forecasts within ~1 minute of analysis time vs the traditional model's ~2 hours. The AI-and-traditional outputs are produced in parallel and made available to downstream users. NOAA deployed an AI-based statistical post-processing system in 2023 and is evaluating full AI replacement of GFS components. Met Office and several other national services have made similar moves. The pattern as of 2026 is that AI methods are increasingly central to operational forecasting at major centres, often complementing rather than replacing traditional NWP — the AI methods deliver fast, skilled, deterministic forecasts; the traditional methods deliver carefully-calibrated ensembles with mature uncertainty quantification.
The 2024–2026 generation
Subsequent developments have continued the trajectory. GenCast (Price et al. 2024, DeepMind) extended GraphCast's deterministic methodology to ensemble forecasting using diffusion models — Section 13 covers it in detail. Aurora (Bodnar et al. 2024, Microsoft) is a foundation-model-style architecture trained on multiple Earth-system datasets with fine-tuning to specific tasks, with results spanning weather forecasting, air-quality prediction, and ocean wave forecasting. FengWu and FuXi (Chinese groups) extended training to longer rollouts (up to 6-week and seasonal scales). MetNet-3 (Andrychowicz et al. 2023, Google) targeted the 0–24 hour nowcasting regime with multi-scale neural networks consuming radar and satellite directly. The methodology continues to mature, and the operational adoption continues to expand.
What AI weather doesn't do (yet)
Despite the empirical wins, AI weather has open limitations. Initial conditions still come from traditional data assimilation; AI methods that replace the assimilation pipeline (the various 2024–2026 "AI 4D-Var" efforts) are research-stage rather than operational. Pre-satellite extension: methods trained on ERA5 cannot easily extend pre-1940 because the underlying reanalysis quality degrades. Out-of-distribution behaviour: methods trained on the recent climatology may fail under unprecedented conditions; the empirical evidence on this is mixed and the question is unresolved. New observation types: incorporating novel observations (new satellite missions, ground-based radars not in the training period) requires retraining or fine-tuning. Section 19 returns to these as research frontiers.
Architectures for Atmospheric Prediction
The successful AI weather models share architectural patterns but differ in important specifics. This section develops the architectural landscape for atmospheric prediction in detail, including the design choices that make atmosphere prediction distinctive from generic ML.
Graph neural networks on icosahedral meshes
GraphCast's multi-resolution icosahedral mesh is the canonical graph-based atmospheric architecture. The icosahedron is the regular polyhedron with the most-uniform vertex distribution on a sphere; recursive subdivision (each triangle split into four) produces meshes at progressively-finer resolutions. GraphCast uses six levels: a coarse ~6° base level, refining to ~0.25° at the finest. The encoder maps grid-point observations onto the finest mesh; the processor performs message-passing across the multi-resolution mesh, with each level's nodes aggregating information from coarser and finer levels; the decoder maps back to grid points. The methodology has multiple advantages: nearly-uniform spatial coverage (no polar singularities), explicit multi-scale structure, and a relatively-small parameter count given the geographic coverage. The 36-million-parameter GraphCast is small by 2024 foundation-model standards yet produces operational-quality forecasts.
3D transformers on Earth-specific grids
Pangu-Weather's 3D Earth-Specific Transformer takes a different approach. The atmosphere is partitioned into 3D windows (latitude × longitude × pressure-level), and a hierarchical Swin-Transformer-style attention pattern moves information across windows. The "Earth-Specific" components include separate model parameters for different latitude bands (recognising the substantially-different dynamics in tropics vs mid-latitudes vs poles), absolute and relative positional encoding aware of the spherical geometry, and a vertical pressure-level structure that respects the atmospheric stratification. Pangu trains four separate models for 1, 3, 6, and 24-hour prediction and combines them at inference (so a 7-day forecast is produced by chaining 24-hour, 6-hour, and 3-hour models) — a pragmatic approach that limits autoregressive error accumulation.
Fourier neural operators
FourCastNet uses the Fourier Neural Operator (FNO) architecture: each layer transforms inputs to the frequency domain, applies a learned kernel multiplicatively, and transforms back to the spatial domain. The methodology is mathematically motivated by the fact that solutions to PDEs (which the atmosphere obeys) often have natural representations in spectral space. Empirically, FNOs are computationally efficient on large grids, capture long-range dependencies through the spectral representation, and have been extended to sphere-aware variants (spherical Fourier neural operators, Bonev et al. 2023) that respect the underlying geometry. FourCastNet's performance is somewhat behind GraphCast and Pangu on standard benchmarks but the architecture remains influential in climate emulation (Section 14) where its computational efficiency at long rollouts matters.
Spherical CNNs and equivariance
A natural extension is making architectures explicitly equivariant to rotations on the sphere. Spherical CNNs (Cohen et al. 2018; the various 2022–2024 successors) build the rotational symmetry into the architecture: a network's response to a rotated input is the same as the rotated response to the original input. The methodology produces models that, in principle, generalise across geographic locations more reliably than non-equivariant alternatives. Empirically, spherical CNNs have been competitive on weather-forecasting benchmarks but have not produced clear wins over the simpler graph-network and transformer architectures. The equivariance machinery (Ch 01 §8) connects to similar ideas in protein-AI (Ch 03 §11–12) and chemistry (Ch 02), where rotational symmetry has been substantially more impactful.
The state-space and operator-learning frontier
The 2024–2026 wave has explored alternatives. State-space models like Mamba (selective state-space; Gu & Dao 2023) have been adapted for atmospheric prediction with claims of better long-context behaviour than transformers. Neural operators beyond FNOs — graph neural operators, DeepONet, and the various 2024–2026 successors — explore different ways of learning maps between function spaces (which is what atmosphere prediction fundamentally is). Diffusion-model-based weather generation (CorrDiff, GenCast) addresses the spectral-bias problem of MSE-trained models. The architectural landscape is more diverse than the protein-AI landscape (where AlphaFold-style architectures dominate), and the empirical question of which architecture best suits which atmospheric prediction problem remains open.
The variable-and-coordinate handling
A specific atmospheric design problem that distinguishes climate AI from generic ML: the prediction variables are heterogeneous (temperature in K, humidity in g/kg, pressure in hPa, wind in m/s), spatially-distributed across thousands of grid points, vertically stratified across dozens of pressure levels, and coupled non-linearly across all of these. The successful methods all use careful per-variable normalisation, per-level treatment of vertical structure, and coordinate systems that reflect physical geometry (pressure rather than height as the vertical coordinate, the various map projections for different applications). Production deployments often include Z-score normalisation per variable per level, with statistics computed from the training-period climatology; getting this right is a substantial part of why one method works and another doesn't.
Ensemble Forecasting and Probabilistic AI
Deterministic forecasts are useful for many purposes; probabilistic forecasts are essential for high-stakes decision making. The classical NWP ensemble methodology runs ~50 perturbed initial conditions through the same model; the AI-equivalent has had to develop its own probabilistic methodology, with diffusion-based ensemble methods now reaching parity with operational ensemble systems.
Why ensembles matter
Lorenz's 1963 demonstration that the atmosphere is deterministic-but-chaotic established that small initial-condition uncertainties grow exponentially, limiting deterministic forecast skill to ~14 days. Ensemble forecasting addresses this by running multiple forecasts with perturbed initial conditions (and increasingly perturbed model physics), producing a probability distribution over outcomes rather than a point estimate. The ensemble spread is itself meaningful: small spread indicates a confident forecast, large spread indicates substantial uncertainty. ECMWF's ensemble system (the ENS) runs 51 members at ~18 km resolution; NOAA's GEFS runs 31 members. The ensemble products drive most operational decisions: hurricane tracks, severe-weather watches, emergency-management timing.
The MSE-training problem
The first generation of AI weather models (FourCastNet, GraphCast, Pangu) trained with mean-squared-error loss against ERA5. MSE is the proper scoring rule for the conditional mean, but the conditional mean of an ensemble distribution is overly-smooth — it averages over the ensemble spread, producing forecasts that look like blurry versions of plausible realizations. The phenomenon is known as spectral bias or the double-penalty problem: forecasts that capture the right large-scale structure but lack realistic small-scale variability. For deterministic skill at the variables the models are evaluated on (Z500, T850 RMSE), the smoothness is acceptable; for many operational purposes (precipitation extremes, wind variability, cloud structure), it is not. The 2022–2023 generation of AI weather models was widely critiqued for this, and the 2024 generation addressed it through ensemble methodology.
GenCast and diffusion-based ensembles
GenCast (Price et al. 2024, DeepMind) is the canonical solution. The methodology applies conditional diffusion — the same diffusion-model technology that produces images from text prompts (Part X) and proteins from structural targets (Ch 03 §13) — to next-step weather forecasting. A noise-prediction network is conditioned on the previous atmospheric state and the diffusion timestep, trained to remove noise from samples drawn from the conditional distribution of next-state-given-current. At inference, the model samples from this conditional distribution by reversing the diffusion process; running this many times produces an ensemble of plausible forecasts with realistic spread and small-scale variability. The empirical results are strong: GenCast's 50-member ensemble exceeds ECMWF's ENS on most variables and lead times in the 2024 evaluation, with computational cost an order of magnitude lower per ensemble member.
Calibration and reliability
Probabilistic forecasts require careful evaluation beyond point-prediction metrics. Calibration measures whether claimed probabilities match observed frequencies — if the model says "80% chance of rain," does it actually rain 80% of the time on those forecasts? Reliability diagrams plot predicted vs observed frequencies; Brier scores measure squared error of probability forecasts; CRPS (Continuous Ranked Probability Score) generalises Brier to continuous variables and is the dominant verification metric for ensemble forecasts. Spread-error consistency is another key check: is the ensemble spread similar in magnitude to the actual forecast error? GenCast and successors have been designed around CRPS as the primary training objective, which is part of why they outperform MSE-trained deterministic models on probabilistic verification scores.
Other ensemble methodologies
Several approaches alternative to diffusion have been explored. Initial-condition perturbation: train a deterministic AI model and run multiple forecasts with perturbed initial conditions (the classical NWP approach applied to AI models). The ensemble spread tends to be too small because the AI model is a deterministic function rather than a chaotic dynamical system. Model-parameter perturbation: run the same initial conditions through models with perturbed weights (dropout-based ensembles, Bayesian neural-network methods). Generative-adversarial methods: train a generator-discriminator pair to produce realistic next-state distributions (the various 2023–2024 GAN-based weather generators). Combined classical-and-AI ensembles: run AI deterministic forecasts and traditional ensemble methods in parallel, combining their outputs. The diffusion-based methods have largely won the empirical comparisons since 2024, but the methodology continues to evolve.
Ensemble post-processing
A separate AI-application area is ensemble post-processing: taking outputs from a traditional NWP ensemble and applying ML methods to correct biases, improve calibration, and produce statistically-better products. The methodology is mature (commercial deployments at most major weather services) and has been substantially improved by the recent generation of neural-network-based methods. The post-processing methodology often combines NWP and AI-model outputs into hybrid ensembles, which can outperform either component alone — an important practical observation for production deployment.
Climate Emulators and Long-Range Projection
Weather forecasting is a 0–14 day problem; climate projection is a decadal-to-centennial problem. The two regimes have different dynamics (initial-condition predictability for weather, boundary-condition forced response for climate) and require different AI methodology. This section develops the climate-emulator problem and the methods that have begun to solve it.
The emulator problem
A climate emulator is a fast neural-network surrogate for an expensive General Circulation Model (Section 7). The classical use case: a CMIP6 climate model takes ~1 month of supercomputer time to integrate 100 years; a trained emulator can produce equivalent outputs in seconds. The emulator can then be run thousands of times to explore parameter space, scenario space, or initial-condition space — kinds of analysis that would be infeasible with the GCM directly. Practical applications include integrated-assessment modelling (where economists need climate response across many emissions scenarios), uncertainty quantification (where statisticians need the model's response to parametric perturbations), and adaptation planning (where engineers need spatially-resolved projections at much higher density than CMIP provides).
The long-stability problem
The fundamental challenge of climate emulation is long-time stability: the model must integrate stably over centuries without drifting to non-physical states. AI models trained on limited data can produce excellent short-term forecasts but accumulate errors over long rollouts; the small-scale errors that don't matter for 7-day weather become catastrophic for 100-year climate. The classical NWP-style models avoid this by being grounded in conservation laws (energy, mass, momentum are exactly conserved by the numerical scheme); pure data-driven models have no such grounding and can produce arbitrarily-bad long-time behaviour. Section 15's hybrid approaches partly address this; the explicitly-emulator methods take various other approaches.
NeuralGCM
NeuralGCM (Kochkov et al. 2024, DeepMind/Google, Nature) is the most-influential modern climate emulator. The methodology embeds a learned neural-network "physics" component within a traditional dynamical core: a JAX-implemented spectral dynamics model handles the large-scale fluid dynamics (which obey conservation laws and are well-understood), and a neural network handles the parameterised physics (radiation, clouds, convection, boundary-layer turbulence — the "subgrid" processes that classical GCMs handle through hand-tuned parameterisations). The combined model is trained end-to-end against ERA5 reanalysis with a multi-scale loss, and produces stable integrations over decades. The empirical results are strong: NeuralGCM's deterministic forecast skill approaches AI models like GraphCast at medium range, while its long-time climate statistics match ERA5 climatology better than most CMIP6 GCMs. The methodology represents a substantial conceptual advance — climate emulation that respects physics where physics matters and uses ML where ML scales.
ClimaX and foundation-model approaches
ClimaX (Nguyen et al. 2023) takes a different approach: train a single foundation-model-style architecture on multiple datasets (ERA5 plus CMIP6 ensemble outputs plus various other climate datasets) and fine-tune for specific tasks (weather forecasting, climate projection, seasonal prediction). The methodology has produced a model that performs reasonably across many tasks without being best-in-class at any single one, but the foundation-model framing is appealing because it suggests scaling laws may apply: a sufficiently-large model trained on sufficient climate data should improve with scale, parallelling the language-modelling experience. Aurora (Microsoft) extends this further, with explicit multi-scale architecture and substantial data scaling. Whether the scaling-law approach proves out for climate as it has for language remains an empirical question.
ACE and pure-emulator approaches
ACE (Watt-Meyer et al. 2024, AI2) is a pure-data-driven climate emulator: a deep neural network trained on long simulations from a high-resolution traditional GCM, producing climate-statistics-faithful integrations at much lower compute cost. The methodology has demonstrated stable 100-year integrations and reasonable climate-statistics reproduction without an explicit physics core. The trade-off vs NeuralGCM is the usual one: ACE is simpler architecturally, easier to scale, but provides less out-of-distribution guarantees. The 2024–2026 wave of pure-emulator approaches continues to produce competitive results, suggesting that careful training and sufficient data may substitute for explicit physics constraints — at least for in-distribution applications.
CMIP emulation and scenario exploration
A specific practical use of climate emulators is CMIP emulation: training a fast surrogate that reproduces the CMIP6 ensemble's response to specified emissions scenarios. The methodology takes emissions scenarios (the SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5 used by IPCC AR6) plus other forcing inputs as conditioning, and produces gridded climate outputs (temperature, precipitation, sea-level rise, etc.) at decadal to centennial scales. The trained emulator runs in seconds rather than months, enabling integrated-assessment-modelling pipelines that would be impossible with GCMs directly. Several systems are in production use — CMIP-emulator efforts at multiple climate-modelling centres, the FaIR simple-climate-model successors, and the various commercial offerings — and they have begun to influence how climate-policy analysis is conducted.
ML Parameterisations within Traditional Models
An alternative to replacing GCMs with ML is augmenting them with ML. ML parameterisations use neural networks to represent specific subgrid-scale processes within a traditional climate model, replacing hand-tuned physical parameterisations with learned ones. The methodology has a longer history than full AI weather models and remains an active research direction.
The parameterisation problem
Traditional GCMs (Section 7) cannot resolve sub-grid-scale processes — convection (~1 km), cloud microphysics (~10 m), turbulence (~1 m). These processes matter for the resolved-scale dynamics, so GCMs use parameterisations: simplified models that take grid-cell-mean state variables as input and return tendencies for the resolved fields. Parameterisations are hand-crafted by domain experts, validated against observations and high-resolution simulations, and tuned to keep the model's overall radiative balance approximately correct. Cloud and convection parameterisations are particularly difficult and are the dominant source of inter-model spread in climate sensitivity (Section 6). The hope of ML parameterisations is to produce learned subgrid models that improve on the hand-crafted versions.
Rasp et al. and the foundational result
The foundational paper is Rasp, Pritchard & Gentine 2018 (PNAS): a deep neural network trained on data from a high-resolution simulation that resolved convection explicitly, deployed as a convection parameterisation in a coarser-resolution host model. The methodology demonstrated that the learned parameterisation reproduced the high-resolution behaviour at far lower compute cost than running the high-resolution model directly. The paper kicked off an entire research direction: a 2024 review (the various Geophysical Research Letters and Journal of Advances in Modeling Earth Systems papers) catalogued 50+ subsequent ML-parameterisation efforts spanning convection, microphysics, cloud-aerosol interactions, ocean-surface fluxes, sea-ice dynamics, and land-surface processes.
The stability problem
The persistent challenge with ML parameterisations is numerical stability: a neural network plugged into a host GCM can produce small errors that the host model amplifies into instabilities, leading to model crashes or non-physical behaviour. The original Rasp 2018 implementation suffered from this. The 2020–2024 wave of methods has substantially improved stability through several techniques: stability-aware training (adding stability constraints to the loss function), online learning (training the parameterisation while it's coupled to the host model rather than offline on simulation data), enforcing physical constraints (energy and mass conservation built into the architecture or as regularisation), and hybrid physics-ML approaches (the ML correction supplements rather than replaces the physical parameterisation). The 2024 generation of methods produces ML-parameterised GCMs that integrate stably over decades, though the methodology is not yet quite operational-grade.
Recent advances
Several specific results from 2024–2026 worth knowing. ClimSim (Yu et al. 2024) is a benchmark dataset of 5.7 billion samples from a high-resolution climate simulation, enabling systematic ML-parameterisation development. CAM-Net (Kochkov et al. and others) demonstrated stability over multi-year integrations with learned cloud-and-radiation parameterisations. CESM2-AI (NCAR's effort) integrates multiple ML parameterisations into the Community Earth System Model. The methodology is increasingly seen as a complement to pure AI emulation rather than a competitor — different parts of the climate model are best handled by different approaches, with the integration as the methodological challenge.
The hybrid models
The cleanest expression of the hybrid approach is the NeuralGCM architecture (Section 14): physical dynamical core + learned subgrid physics, trained end-to-end. Several groups are pursuing similar architectures: PINNs-for-climate (physics-informed neural networks; Karniadakis et al.'s methodology applied to atmospheric prediction), M2LINES (a major collaboration between climate-modelling centres on ML-improved climate models), and the various closure-learning approaches that use ML to correct truncation errors in low-resolution dynamical cores. The methodology is computationally heavier than pure AI emulation (because it includes the physical dynamical core) but provides much stronger out-of-distribution guarantees, making it the favoured approach for climate-projection applications where reliability under unprecedented forcings matters.
Remote Sensing and Observational AI
The modern climate observation system produces multiple terabytes of satellite and in-situ data per day. AI methods are uniquely positioned to extract climate-relevant signal from this data deluge, and the resulting outputs feed both operational forecasting and long-term climate monitoring.
The satellite data substrate
Modern climate-observing satellites carry instruments far beyond simple imagery. Multispectral imagers (the visible-and-infrared imagers on GOES, Himawari, MetOp, Sentinel-2, Landsat) measure radiance in dozens of wavelength bands tuned for specific purposes (vegetation indices, fire detection, cloud properties, ocean colour). Sounders (the Cross-track Infrared Sounder on JPSS, the Infrared Atmospheric Sounding Interferometer on MetOp) profile atmospheric temperature and humidity. Microwave radiometers see through clouds for sea-surface temperature and atmospheric water content. Radars (the Global Precipitation Measurement mission, EarthCARE) profile precipitation and clouds in 3D. Lidars (CALIPSO and successors) profile aerosol and cloud structure. Altimeters (Jason-3, Sentinel-6) measure sea-surface height to centimetre accuracy. Scatterometers measure ocean winds. The combined data stream is multiple terabytes per day, far beyond what traditional retrieval methods can absorb.
Cloud detection and retrieval
The most-mature AI-for-remote-sensing application is cloud detection: distinguishing clouds from clear-sky in satellite imagery, which is the prerequisite for essentially all atmospheric retrieval. Traditional threshold-based methods worked well for some sensors but failed at high latitudes, over snow/ice surfaces, and for thin cirrus. CNN-based cloud-detection methods (the various 2017–2024 papers) substantially exceeded traditional accuracy, with deployment now standard at the major operational centres. Cloud-property retrieval (cloud optical depth, particle size, phase) extends this to physical-property estimation; ML methods are increasingly deployed alongside traditional optimal-estimation methods. The combined cloud detection-and-retrieval pipeline is central to operational radiative-transfer calculations in NWP.
Precipitation estimation
Precipitation estimation from satellite is one of the most-impactful remote-sensing applications because most of the world has limited ground-based precipitation measurement. The challenge is that satellites don't measure precipitation directly — they measure brightness temperatures or radar reflectivities, from which precipitation must be inferred. The Integrated Multi-satellitE Retrievals for GPM (IMERG) is the operational gold standard, combining multiple satellite inputs into half-hourly global precipitation estimates. ML methods improve specific aspects: convolutional networks better detect organised precipitation systems, recurrent networks capture temporal dependencies, and recent transformer-based methods (the various 2024–2026 systems) integrate multiple sensors into unified estimates. Production deployments at NASA, JAXA, and other space agencies are ongoing.
Trace-gas and air-quality monitoring
Modern satellites measure atmospheric concentrations of greenhouse gases (CO₂, methane) and air pollutants (NO₂, SO₂, ozone). TROPOMI (on Sentinel-5P) and MethaneSAT are the modern flagship missions for high-resolution methane monitoring; their data has been used to identify previously-unknown methane "super-emitter" sources at oil-and-gas facilities and waste sites. AI methods process the high-volume satellite imagery for source attribution, plume tracking, and emission-rate estimation. The methodology is increasingly relevant for climate policy, where verification of national emissions claims is becoming a treaty-related concern.
Vegetation and land-surface monitoring
Satellite measurements of the land surface — vegetation indices (NDVI, EVI), soil moisture, snow cover, land-surface temperature — feed both operational weather forecasting and long-term climate monitoring. ML methods process these data streams for everything from deforestation detection (Global Forest Watch and the various successors process Sentinel and Landsat imagery to flag forest loss in near-real-time), crop yield prediction (combining vegetation indices with weather data for agricultural forecasting), and wildfire detection and prediction (the various 2023–2026 wildfire-AI systems process geostationary infrared imagery to detect new fires and predict spread). The methodology has substantial overlap with the broader Earth-observation AI domain and connects to the agricultural and ecological applications that this chapter does not develop in detail.
Reanalysis and data assimilation
The data assimilation step that produces reanalysis combines observations with a model background to produce gridded analyses. Traditional methods (variational 4D-Var, ensemble Kalman filters) have been continuously improved at the major centres for thirty years. AI data assimilation is an emerging frontier: methods that use neural networks for parts of the assimilation pipeline, full neural-network-based assimilation systems, and hybrid AI-traditional approaches. The 2024–2026 wave includes FuXi-Weather (Chen et al. 2024) which integrates AI weather forecasting with AI-driven assimilation, and the various ECMWF and DeepMind efforts on AI-assisted reanalysis. The methodology promises substantially-faster reanalysis production than traditional methods, with implications for climate-data infrastructure that the field is still working out.
Extreme Events and Attribution
Climate change shows up most viscerally in extreme events — heatwaves, floods, droughts, hurricanes, wildfires. AI methods have produced substantial advances in extreme-event detection, prediction, and the attribution of specific events to anthropogenic climate change. The methodology has both scientific and increasingly legal-and-policy stakes.
The extreme-event challenge
Extreme events are by definition rare, which creates several challenges. Limited training data: the events of interest are scarce in the historical record, complicating both model training and validation. Non-stationarity: climate change is shifting the distributions, so extremes that were once rare are becoming more common, and methods trained on historical data may underestimate future risk. Tail-behaviour estimation: standard ML methods optimised for average-case performance often produce poorly-calibrated predictions in the tail of the distribution, exactly where extreme events live. The methodology has had to develop specific extreme-value-statistics-aware approaches (incorporating Generalised Extreme Value distributions, peaks-over-threshold methods, and the various extreme-value adaptations of standard ML) to produce trustworthy predictions for extremes.
Heatwave detection and prediction
Heatwaves are the deadliest weather extreme — the 2003 European heatwave killed ~70,000 people; the 2010 Russian heatwave ~50,000. AI methods improve heatwave forecasting through several mechanisms. Pattern recognition in atmospheric circulation: CNN-based methods identify circulation patterns (blocking highs, ridge configurations) that precede heatwaves, providing extended-range warning. Sub-seasonal prediction: ML methods bridge the predictability gap between weather (~14 days) and seasonal (~3 months) forecasting, where heatwave preconditioning often appears. Compound-event detection: heatwaves often coincide with droughts or air-quality crises; ML methods that detect compound configurations are more useful operationally than single-variable predictors. The 2024–2026 generation of heatwave-AI methods produces 2-3 week lead-time forecasts that are operationally useful for public-health agencies and emergency management.
Hurricane and tropical cyclone prediction
Tropical cyclones (hurricanes in the Atlantic, typhoons in the Pacific, cyclones in the Indian Ocean) are the canonical high-impact weather events, and tropical-cyclone forecasting is one of the longest-standing operational AI applications in weather science. Modern methods include AI-based track forecasting (the various 2023–2024 systems matching or exceeding NHC operational track-error metrics), intensity forecasting (a notoriously hard problem because it depends on small-scale processes that AI methods can sometimes capture better than coarse traditional models), rapid-intensification detection (identifying the specific configuration that precedes ~30% wind-speed increases in 24 hours; rapid intensification is the deadliest forecast failure mode), and storm-surge prediction (combining track forecasts with topography and bathymetry to predict coastal flooding). Operational deployments at NHC, JTWC, and other centres are ongoing.
Flood forecasting
Flood forecasting combines weather forecasting with hydrological models that route water through river basins. The 2022 Google Flood Hub launch deployed AI-based flood forecasting across India, Bangladesh, and several other countries with limited traditional flood-forecast infrastructure, providing operational warnings to ~460M people. The methodology combines weather-forecast precipitation with neural-network-based hydrological models trained on global river-discharge data. Subsequent 2024–2026 expansions extend coverage to additional basins and increase lead times. The Flood Hub is one of the most-direct examples of AI-for-climate methodology producing operational humanitarian impact.
Wildfire risk and detection
The 2020s have seen increasingly severe wildfire seasons globally — California in 2020 and 2025, Australia 2019–2020, Canada 2023, Greece 2018 and 2023. AI methods address wildfires across the timeline. Risk forecasting: combining weather forecasts, vegetation moisture estimates, and historical fire patterns to predict regional fire risk days to weeks in advance. Early detection: processing geostationary satellite infrared imagery to detect new ignitions within minutes (the various GOES-based fire-detection AI systems). Spread prediction: combining wind forecasts, terrain, and fuel data to project fire perimeters; modern methods produce forecasts with substantially better skill than traditional methods. Smoke and air-quality forecasting: predicting smoke transport from active fires for public-health warning. The methodology is operationally deployed in fire-prone regions and is expanding rapidly.
Extreme-event attribution
Extreme-event attribution quantifies how much climate change made a specific event more likely or more intense than it would have been in a pre-industrial climate. The traditional methodology (the World Weather Attribution group's standard approach, established ~2015) runs targeted climate-model experiments comparing the present and counterfactual pre-industrial climates and reports probability ratios. The methodology takes weeks-to-months and is published after the event. The AI-attribution methodology compresses this timeline substantially: pre-trained climate emulators (Section 14) run the counterfactual experiments in seconds, AI-based pattern matching identifies the relevant climate-change signal, and integrated systems produce attribution statements within days of an event rather than months. The 2024–2026 systems are increasingly deployed in real-time after major events to provide quantitative attribution that informs both public communication and litigation. The methodology has both scientific and policy stakes — climate-litigation cases increasingly cite attribution evidence, and the courts have begun to engage with the methodology directly.
Downscaling, Bias Correction, and Regional Projection
Global climate models produce projections at ~50–100 km resolution; many practical decisions require ~1–10 km regional information. AI methods for downscaling and bias correction translate coarse global projections into actionable regional information, with substantial implications for adaptation planning.
The downscaling problem
Statistical downscaling takes coarse climate-model outputs and produces fine-resolution regional projections, typically by learning a relationship between coarse model output and high-resolution observations during a training period and applying that relationship to future projections. The traditional methodology uses statistical methods (regression, analogue methods, weather-typing) and produces useful but limited results. AI methods extend this in several directions: deep-learning downscaling (CNN-based super-resolution adapted for climate variables), conditional generative methods (GAN-based and diffusion-based methods producing physically-realistic high-resolution fields conditioned on coarse inputs), and multi-variable downscaling (jointly producing temperature, precipitation, wind, and humidity in physically-consistent combinations). The methodology has matured substantially since 2020 and is increasingly deployed in operational adaptation planning.
CorrDiff and diffusion-based downscaling
CorrDiff (Mardani et al. 2025, NVIDIA) is a representative modern method: a conditional diffusion model that takes coarse weather-prediction outputs and produces high-resolution regional fields. The methodology was demonstrated for Taiwan (where regional topography produces complex fine-scale patterns that coarse models don't resolve) and substantially exceeded prior downscaling methods on standard verification scores. The diffusion-model framing addresses the spectral-bias problem that plagued earlier deterministic super-resolution methods — diffusion produces realistic small-scale variability rather than overly-smooth deterministic outputs. Several 2024–2026 systems extend CorrDiff to other regions and to climate-projection (rather than just weather-forecast) downscaling.
Bias correction
Even excellent climate models have systematic biases — too much precipitation in some regions, too few hurricanes in others, too-cold temperatures in specific seasons. Bias correction uses statistical or ML methods to align model outputs with observational distributions before passing them to downstream applications. The classical methods (quantile mapping, the various distributional-matching approaches) work reasonably well for some variables but fail for others (particularly precipitation extremes and compound events). AI-based bias correction methods learn richer relationships between model outputs and observations, including spatial-and-temporal patterns of bias rather than just marginal-distribution corrections. The methodology is operationally deployed at most major climate-services providers and increasingly for climate-projection-based adaptation planning.
Regional climate models and ML hybrids
Regional climate models (RCMs) — high-resolution climate models run over a specific region with boundary conditions from a global GCM — have been the traditional approach to high-resolution climate projection. RCMs are computationally expensive (running an RCM for one region across the standard CMIP scenarios takes weeks of supercomputer time), and their methodology is mature. AI methods are increasingly integrated with RCMs in several ways: RCM emulation (training fast surrogates for RCMs that produce equivalent outputs in seconds), ML-improved RCM parameterisations (the regional version of the GCM-improvement methodology of Section 15), and hybrid RCM-AI workflows (using RCMs for the most-uncertain regions and AI methods for additional resolution everywhere). The combined methodology is producing the next generation of climate-services products.
Adaptation-relevant outputs
Practical adaptation planning needs specific outputs that climate models don't directly produce. Return-period changes: how does the 100-year flood become a 30-year flood under climate change? Compound-event statistics: how does the likelihood of compound heat-and-drought change? Sectoral indicators: heating and cooling degree-days, growing-season length, wildfire-weather indices, drought-severity indicators. AI methods process climate-model and downscaled output to produce these adaptation-relevant outputs, often combining climate data with sectoral models (hydrological models for flood, building-energy models for HVAC, crop models for agriculture). The methodology bridges climate science and applied decision-support, and is the entry point through which AI-for-climate often connects to operational adaptation planning.
The propagation-of-uncertainty problem
A specific methodological challenge is propagating uncertainty through the downscaling-and-bias-correction pipeline. The original GCM has structural uncertainty (different models give different answers); the downscaling adds further uncertainty (different downscaling methods give different fine-scale patterns); the bias correction adds another layer; the impact-model further compounds. Naive AI methods often produce confident-looking high-resolution outputs that obscure the underlying uncertainty. The 2024–2026 generation increasingly uses probabilistic methods (ensemble downscaling, conditional-distribution methods) that preserve uncertainty estimates through the pipeline. Whether decision-makers use these uncertainty estimates appropriately is its own substantial science-policy question that the field is gradually working through.
The Frontier and the Operational Question
AI for climate has matured substantially over the past three years. This final section surveys the frontier — the open methodological problems, the active research directions, and the operational questions that will shape the field over the next several years.
Foundation models for Earth-system AI
The 2024–2026 wave of Earth-system foundation models aspires to do for climate what GPT did for language and AlphaFold did for protein structure: produce a single large model trained on substantial Earth-system data that can be fine-tuned for many downstream tasks. Aurora (Microsoft) and Earth-2 (NVIDIA) are the most-developed examples; both train on multiple Earth-system datasets (ERA5, CMIP, satellite data, ocean reanalyses) and produce useful results across weather forecasting, climate emulation, and air-quality prediction. ClimaX (Nguyen et al. 2023) was an earlier foundation-model effort with smaller scale. Whether Earth-system foundation models will produce the same kind of qualitative breakthrough that language and protein foundation models did, or whether the climate domain's specific structure (smaller data per task, harder evaluation, physics-based constraints) limits the foundation-model approach, remains an open empirical question for 2026.
The seasonal-to-decadal gap
Traditional weather forecasting is skilful out to ~14 days; climate projection is skilful at multi-decadal scales. The intermediate seasonal-to-decadal regime is the substantial predictability gap — useful information exists (ENSO state has substantial influence on the next 6 months; ocean-heat-content patterns shape multi-year temperature trends) but extracting it has been difficult. AI methods are increasingly bridging this gap: FengWu demonstrated useful skill out to 6-week lead times; the various decadal-prediction AI systems are producing skilful 1–5-year temperature predictions. Whether AI methods can extend this further — into the seasonal-to-multi-year predictability that's most directly useful for agricultural planning, water-resource management, and infrastructure decisions — is the major open question for the next several years.
Tipping-point detection and early warning
Climate science is increasingly concerned with potential tipping points in the Earth system: thresholds beyond which large-scale, possibly-irreversible changes occur (Section 6 introduced the concept). The AI methodological challenge is detecting early-warning signals from observational time series — increasing variance, autocorrelation, or skewness in a signal that precedes a tipping-point crossing. Theoretical work on early-warning signals has been substantial; AI methods have begun to apply the methodology to specific tipping elements (Atlantic Meridional Overturning Circulation, Greenland ice sheet, Amazon dieback). The empirical challenge is that tipping points have happened so rarely in the observational record that training-data scarcity is severe; methods are increasingly using simulated tipping events from coupled climate models for training. The frontier is unresolved but rapidly progressing.
Storm-resolving climate
The next generation of climate models — storm-resolving or k-scale models running at ~1 km horizontal resolution — promises to substantially-improve climate-projection skill by explicitly resolving processes that current models parameterise. The DYAMOND intercomparison (DYnamics of the Atmospheric general circulation Modeled On Non-hydrostatic Domains) ran multiple centres' k-scale models in coordinated experiments. The compute requirements are extreme (~exascale machines integrating for years to produce centennial projections), and the data outputs are correspondingly large (~petabytes per experiment). AI methods for storm-resolving climate have multiple applications: emulation (running a fast surrogate that reproduces k-scale outputs), parameterisation (using k-scale simulations as training data for ML closures in coarser models), and analysis (the data volumes are too large for traditional analysis methods). The methodology represents the substantial frontier for the next several years.
Climate AI for policy
A substantive methodological frontier is integrating AI climate methods with the broader analytical infrastructure that informs climate policy. Integrated Assessment Models (IAMs) couple climate-emulators with economic-and-energy-system models to evaluate emissions trajectories, mitigation costs, and adaptation needs. Damage functions translate climate-projection outputs into economic-impact estimates. Real-world evidence frameworks for climate-related litigation increasingly require AI-based attribution. Energy-system planning uses climate-AI outputs for renewable-resource projection and grid-reliability analysis. The methodology is messy because it sits at the boundary of multiple disciplines (climate, economics, policy analysis, decision theory), but it's where AI for climate has its most-direct policy impact.
The physics-vs-data debate, restated
Section 10 introduced the physics-vs-data tension; the operational reality is that the field is gradually converging on hybrid approaches. Pure data-driven methods produce excellent in-distribution forecasts but raise concerns about out-of-distribution behaviour; pure physics-based methods produce robust extrapolation but cannot match the in-distribution skill of AI methods. The hybrid methods — physics-informed neural networks, NeuralGCM-style architectures, ML parameterisations within physical models — are increasingly the practical answer. The 2024–2026 generation increasingly uses conservation-aware training (energy conservation enforced as loss-function regularisation), dimensional-analysis-aware architectures (input variables transformed to dimensionless combinations), and physics-informed evaluation (testing models on physical consistency in addition to accuracy). The methodological maturation is substantial.
The operational deployment question
The single most-important operational question for the field is the rate at which AI methods replace, augment, or coexist with traditional NWP at the operational forecasting centres. As of 2026, the pattern is increasingly "AI for fast deterministic forecasts plus traditional methods for ensemble and assimilation," but this could shift further toward AI in either or both directions over the next few years. The decision involves substantial operational considerations: scientific credibility (peer-reviewed and reproducible), computational cost (AI methods are dramatically cheaper for inference but expensive to train), reliability (decades of NWP uptime vs months of AI-system uptime), and downstream-user interfaces (the operational user community is built around traditional NWP products). The transition is underway but uncertain in pace and depth.
What this chapter does not cover
Several adjacent areas are out of scope. The substantial atmospheric-chemistry literature (ozone modelling, aerosol-radiation interactions, biogeochemistry beyond carbon) is touched only briefly. Cryosphere modelling (sea-ice, ice-sheet, glacier, snow-cover) has its own substantial AI applications that this chapter does not develop. The economics of climate impacts and mitigation is acknowledged in passing but is properly the domain of integrated-assessment modelling. Solid-Earth and geophysical applications (seismology, volcanic monitoring) are out of scope. The chapter focused on the methodological core of AI for atmospheric-and-oceanic climate; the broader Earth-system-AI landscape is genuinely vast.
Further reading
A combined reading list for climate, Earth-system science, and AI. The climate-foundation references — Hartmann's Global Physical Climatology, Wallace & Hobbs's Atmospheric Science, the IPCC AR6 WGI report, Talley physical oceanography, Pierrehumbert planetary climate, Bender paleoclimate, the ERA5 reference paper, and Weart's history of the greenhouse effect — establish the climate-science substrate. The AI-methodology references — Lam et al.'s GraphCast, Bi et al.'s Pangu-Weather, Price et al.'s GenCast, Kochkov et al.'s NeuralGCM, and the various others — establish the methodology. The field is rapidly evolving as of 2026 and the AI-for-climate literature is accumulating fast.
-
Global Physical ClimatologyThe standard graduate-level climate textbook. Comprehensive coverage of the energy budget, atmospheric and oceanic circulation, the hydrologic cycle, climate variability, and climate change. Mathematically rigorous but readable. The right starting reference for any AI reader engaging seriously with climate science. The reference climate-science textbook.
-
Atmospheric Science: An Introductory SurveyThe standard atmospheric-science textbook, in print continuously since 1977. Comprehensive coverage of atmospheric composition, thermodynamics, dynamics, radiation, cloud microphysics, and weather systems. The natural complement to Hartmann for understanding the atmosphere specifically. The reference atmospheric-science textbook.
-
IPCC AR6 Working Group I: The Physical Science BasisThe most-recent comprehensive synthesis of climate-science evidence by the international scientific community. Covers observations, attribution, modelling, projections, and regional analyses. The Summary for Policymakers is the most-cited document in modern climate science; the underlying ~3,000-page report is the substantive reference. The natural reading for understanding the current consensus on climate change. The reference for climate-change science.
-
Descriptive Physical OceanographyThe standard physical-oceanography textbook. Comprehensive coverage of ocean basins, water masses, circulation, heat and salt budgets, and the major regional systems. The right reading for understanding the ocean component of the climate system in depth. The reference physical-oceanography textbook.
-
Principles of Planetary ClimateA rigorous, broadly-scoped textbook on planetary climate physics, with substantial attention to the radiative transfer that underpins the greenhouse effect. Covers Earth, Mars, Venus, and exoplanetary atmospheres in a unified framework. The natural reading for an AI reader who wants to understand the radiative-transfer physics underlying climate models. The reference for planetary-climate radiative physics.
-
PaleoclimateA modern textbook on paleoclimate methodology and findings. Covers the major proxies, the glacial-interglacial cycles, deep-time climates, and the methodology of converting proxy measurements into climate quantities. The right reading for understanding the paleoclimate evidence underpinning climate-sensitivity estimates. The reference paleoclimate textbook.
-
The ERA5 global reanalysisThe reference paper for ERA5, the most-used modern climate-reanalysis product. Documents the methodology, data assimilation framework, observational inputs, and quality assessment. ERA5 is the substrate of essentially every modern AI weather and climate paper, including GraphCast and successors; the reference paper is the right starting point for understanding what reanalysis produces and what its limitations are. The reference for modern reanalysis methodology.
-
A Modern History of the Greenhouse EffectThe definitive history of climate-change science from Fourier through the 21st century. Covers the gradual scientific accumulation of evidence, the political and institutional context, and the consensus that emerged over decades. The right reading for understanding how climate science arrived at its current state and why the political response has lagged the science by decades. The reference for the history of climate science.
-
Learning skillful medium-range global weather forecasting (GraphCast)The GraphCast paper. Demonstrates that a graph neural network on a multi-resolution icosahedral mesh trained on ERA5 reanalysis can match or exceed traditional numerical weather prediction at medium range, running ~10⁵× faster. The watershed AI-for-climate paper, comparable to AlphaFold's role in structural biology. The natural starting point for the modern AI weather-forecasting methodology of Section 11. The reference AI-for-weather paper.
-
Accurate medium-range global weather forecasting with 3D neural networks (Pangu-Weather)The Pangu-Weather paper. Demonstrates that a 3D Earth-Specific Transformer trained on ERA5 produces medium-range forecasts that exceed the operational ECMWF IFS on standard verification scores. Cited alongside GraphCast as one of the two foundational modern AI-weather papers, with substantially-different architectural choices (3D transformer vs graph net) that both work. The reference for transformer-based weather forecasting.
-
Probabilistic weather forecasting with machine learning (GenCast)The GenCast paper. Extends GraphCast's deterministic methodology to ensemble forecasting using diffusion models, producing 50-member probabilistic forecasts that exceed ECMWF's operational ensemble (ENS) on most variables and lead times. Establishes the diffusion-based ensemble methodology that is increasingly central to operational AI weather forecasting. The natural reading for the probabilistic-AI material of Section 13. The reference for diffusion-based ensemble forecasting.
-
Neural general circulation models for weather and climate (NeuralGCM)The NeuralGCM paper. The most-influential modern climate emulator: combines a JAX-implemented spectral dynamical core with neural-network parameterisations for subgrid physics, trained end-to-end on ERA5. Produces stable multi-decade integrations with weather-forecast skill approaching GraphCast and climate-statistics fidelity matching or exceeding most CMIP6 models. The natural reading for the climate-emulator material of Section 14 and the hybrid physics-ML methodology of Section 15. The reference for hybrid physics-ML climate models.
-
FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural OperatorsThe FourCastNet paper. The earliest widely-noticed modern AI weather model and the entry point that demonstrated the methodology was at least competitive with operational NWP. Uses Fourier Neural Operators (FNOs) on regular lat-lon grids; subsequent work extended to spherical FNOs and a wide variety of architectural variants. The natural starting reference for understanding the FNO-based methodology that complements the graph-net and transformer approaches. The reference for the early modern AI weather wave.
-
WeatherBench 2: A benchmark for the next generation of data-driven global weather modelsThe WeatherBench 2 paper. The community standard for benchmarking AI weather-forecasting methods, with standardised data, metrics, and evaluation infrastructure. Substantially extends the original WeatherBench (Rasp et al. 2020) with operational-NWP comparisons, finer time resolution, and probabilistic-evaluation methodology. The natural reading for understanding the benchmarking culture that distinguishes climate AI from many other AI-for-Science domains. The reference benchmark for AI weather forecasting.
-
Aurora: A Foundation Model for the Earth SystemThe Aurora paper. A foundation-model-style architecture trained on multiple Earth-system datasets, with fine-tuning to specific tasks (atmospheric forecasting, air-quality prediction, ocean-wave forecasting, tropical-cyclone tracking). The most-developed Earth-system foundation-model effort as of 2024, and the natural reading for the foundation-model frontier of Section 19. The reference for Earth-system foundation models.
-
Deep learning to represent subgrid processes in climate modelsThe foundational paper on neural-network parameterisations for subgrid-scale processes in climate models. Demonstrates that a neural network trained on a high-resolution simulation can replace a traditional convection parameterisation in a coarser-resolution host model. The substrate of subsequent work on AI-improved parameterisations and a key methodological reference for the hybrid-models direction. The natural reading for the parameterisation material of Section 15. The reference for ML-based parameterisations.
-
Skilful precipitation nowcasting using deep generative models of radar (DGMR)The DGMR paper. A conditional generative-adversarial network for short-range (0–90 minute) precipitation forecasting from radar data, with substantially better skill than traditional radar-extrapolation methods. The methodology has been operationally deployed at the Met Office and substantially-shaped the modern nowcasting landscape. The natural reading for understanding the high-resolution short-range AI weather-forecasting methodology that complements the medium-range methods of Section 11. The reference for AI nowcasting.
-
A neural network–based operative scheme for solar radiation pressure and aerosol pollution from satellite imageryA representative bundle of recent papers on AI-based remote sensing for atmospheric chemistry, air-quality monitoring, and trace-gas retrieval. The methodology spans cloud detection, aerosol retrieval, methane plume detection (TROPOMI/MethaneSAT), and air-quality forecasting. The natural starting point for the remote-sensing AI material of Section 16. The reference cluster for remote-sensing AI.
-
Generative residual diffusion modeling for km-scale atmospheric downscaling (CorrDiff)The CorrDiff paper. Establishes the diffusion-based methodology for kilometre-scale atmospheric downscaling that has become the dominant approach since 2024. Conditional diffusion produces realistic small-scale variability that deterministic super-resolution methods miss. The natural reading for the downscaling material of Section 18. The reference for AI downscaling.
-
Skilful nowcasting of extreme precipitation with NowcastNetA NowcastNet paper extending DGMR-style methodology to extreme precipitation, with explicit physical-conservation constraints and substantial improvements in extreme-event skill. Cited here as a representative example of AI methods for extreme-event prediction (Section 17). The reference for extreme-precipitation nowcasting.