Part XV · AI for Science · Chapter 09

Astronomy & Astrophysics & AI, where deep learning meets the data deluge.

Astronomy is the oldest observational science and one of the most data-rich. Modern astrophysics fuses electromagnetic observations across radio, infrared, optical, ultraviolet, X-ray, and gamma-ray bands with gravitational-wave detections, cosmic-ray and neutrino observations, and an enormous body of computational simulation. Astronomy has been one of the most-active AI-for-Science application areas for two decades, and the operational case has only sharpened. The Vera C. Rubin Observatory's LSST began full operations in 2025, generating roughly ten million transient alerts per night. Gaia DR3 (2022) and the upcoming DR4 (2026) provide a billion-star astrometric catalogue. JWST has been delivering early-universe galaxies at unexpected redshift since 2022. Gravitational-wave detection now routinely uses neural networks for rapid event detection and posterior estimation. This chapter develops both the working astronomy vocabulary an AI reader needs (Sections 2–9: units and coordinates, the cosmic distance ladder, stellar evolution and the H-R diagram, galaxies and large-scale structure, ΛCDM cosmology, the major surveys, exoplanets, and gravitational waves) and the AI methodology that has reshaped the field (Sections 10–19). Section 10 is the bridge that orients an ML practitioner to the astronomy-AI landscape.

Prerequisites & orientation

This chapter assumes the physics vocabulary of Ch 07 (general relativity at conceptual level, statistical mechanics, basic quantum mechanics). The vocabulary half (Sections 2–9) is at undergraduate-introductory level in astronomy; mathematical comfort with logarithms and order-of-magnitude reasoning is essential because almost everything in astronomy is logarithmic. Readers without prior astronomy coursework can skim historical detail and focus on the conceptual structure. The methodology half (Sections 11–19) assumes the working machinery of modern deep learning (CNNs and transformers from Part VI for §13–§15 on images and light curves, GNNs from Part XIII Ch 05 for catalogue-structure work, the Bayesian-deep-learning material of Part XIII Ch 07 for §18 simulation-based inference and posterior estimation, the time-series methodology of Part XIII Ch 02–03 for transit search and light-curve classification, and generative-model machinery from Part X for the foundation-model and diffusion-based denoising work in §19).

Three threads run through the chapter. The first is the distance-age coupling: in astronomy, looking far means looking back in time, and the cosmic distance ladder simultaneously calibrates spatial scales and the universe's age. The second is the multi-messenger imperative: the same astrophysical event radiates across many bands and many messengers, and the science usually comes from cross-matching detections across instruments. The third is the volume-vs-rarity tension: astronomical surveys produce billions of objects but the scientifically-interesting events (Type Ia supernovae, kilonovae, strong gravitational lenses, exoplanet transits, anomalies) are typically per-million or per-billion rare; ML methods must handle severe class imbalance and the operational demand for very low false-positive rates. Section 10 is the bridge that frames these themes; they appear in passing throughout.

In this chapter

Why Astronomy, and Why Astronomy-AI data substrate · multi-messenger · why now · downstream
Units, Coordinates, and Magnitudes parsecs · magnitudes · spectral types · coordinate systems
The Cosmic Distance Ladder parallax · standard candles · Cepheids · SNe Ia · BAO
Stellar Evolution and the H-R Diagram main sequence · spectral types · giants · remnants · nucleosynthesis
Galaxies, Galactic Dynamics, and Large-Scale Structure Hubble sequence · rotation curves · dark matter · clusters · cosmic web
Cosmology and the ΛCDM Model Hubble · CMB · ΛCDM · inflation · dark energy
Surveys and Instruments JWST · Gaia · LSST · SDSS · Euclid · radio arrays
Exoplanets and Their Detection transits · radial velocity · direct imaging · atmospheres · habitability
Gravitational Waves and Multi-Messenger Astronomy LIGO · Virgo · binary mergers · GW170817 · LISA
From Astronomy to ML: An Orientation volume vs rarity · simulation gap · real-time pipelines
The Astronomical Data Substrate for ML images · light curves · spectra · catalogues · alert streams
Transient Detection and Photometric Classification image differencing · alert brokers · RNNs · transformers · PLAsTiCC
Photometric Redshift Estimation template fitting · MDNs · normalising flows · OOD · calibration
Galaxy Morphology and Image-Based Inference Galaxy Zoo · CNNs · strong lensing · weak lensing · shear
Exoplanet Detection and Characterisation Kepler · TESS · false-positive vetting · atmospheric retrieval
Gravitational-Wave Detection and Parameter Estimation matched filtering · CNNs · normalising flows · DINGO · glitches
Simulation-Based Inference for Cosmology N-body · CAMELS · neural posterior · likelihood-free · amortisation
Anomaly Detection and Foundation Models discovery · autoencoders · contrastive · AstroCLIP · Multimodal Universe
The Frontier and the Operational Question LSST · DR4 · LISA · biosignatures · what next

Why Astronomy, and Why Astronomy-AI

Astronomy is the oldest observational science and one of the most data-rich. Modern astrophysics fuses electromagnetic observations across radio, infrared, optical, ultraviolet, X-ray, and gamma-ray bands with gravitational-wave detections, cosmic-ray and neutrino observations, and an enormous body of computational simulation. The data volumes are extraordinary — Vera Rubin's LSST began full operations in 2025 and is generating about 20 terabytes of raw data per night with around 10 million transient alerts per night, while Gaia DR3 catalogues nearly 2 billion stars and JWST has been delivering early-universe galaxies since 2022. Astronomy was an early adopter of machine learning and is now one of the most operationally embedded AI-for-Science domains: photometric classifiers run on every alert stream, simulation-based inference has replaced classical likelihoods for cosmological-parameter estimation, neural posterior estimation produces gravitational-wave parameter estimates in seconds rather than CPU-days, and foundation models trained across multimodal astronomical data are reshaping the methodological default. This chapter develops both the working astronomy vocabulary an AI reader needs (Sections 2–9) and the AI methodology that has transformed the field (Sections 10–19). Section 10 is the bridge that frames what makes astronomy-AI methodologically distinctive; this section maps the territory itself.

The distance-age coupling

The most useful framing of astronomy for an AI reader is the distance-age coupling: in astronomy, looking far means looking back in time, and the cosmic distance ladder simultaneously calibrates spatial scales and the universe's age. Every measurement of a remote object is a measurement of the past, and reconstructing cosmological history from contemporary observations is the central inferential task. Section 3 develops the distance ladder in detail; the AI methods of Section 14 (photo-z) operationalise it at survey scale, and the cosmological inference of Section 18 (SBI) uses it to constrain ΛCDM parameters.

The multi-messenger imperative

The same astrophysical event radiates across many bands and many messengers (photons, gravitational waves, neutrinos, cosmic rays), and the science usually comes from cross-matching detections, joint inference across instruments, and the temporal correlations of transient events. AI methods are essential for the cross-matching, the photometric classification (Section 13), the rapid alert triage that supports gravitational-wave electromagnetic follow-up (Section 17), and the joint inference that connects theory to observation. The methodology of this chapter is partly the systematic engineering response to multi-messenger imperatives.

Why this is one chapter, not two

The vocabulary and the methods are tightly intertwined. Photometric redshift estimation (Section 14) only makes sense once the cosmic distance ladder is understood (Section 3). Galaxy morphology ML (Section 15) only makes sense once the Hubble sequence and modern galaxy surveys are understood (Sections 5, 7). Gravitational-wave parameter estimation (Section 17) only makes sense once general relativity, compact-object mergers, and the detector network are understood (Sections 4, 9 of this chapter and §6 of Ch 07 on GR). Simulation-based inference (Section 18) only makes sense once ΛCDM, structure formation, and the large-scale-structure observables are understood (Section 6). Reading just the AI half without the vocabulary leaves an AI practitioner unable to evaluate methodological choices; reading just the vocabulary leaves an astronomer unaware of how the field is being reshaped. The 19-section structure is therefore deliberate: §2–9 develop the vocabulary, §10 bridges to the methodology, §11–19 develop the methods.

Units, Coordinates, and Magnitudes

Astronomy uses domain-specific units that are best memorised early. Distances run from astronomical units within the Solar System through parsecs and kiloparsecs for stars and galaxies to megaparsecs and gigaparsecs for cosmology. Brightness uses logarithmic magnitudes inverted in sign. Coordinates work in spherical systems tied to either Earth (equatorial), the ecliptic, the Galaxy, or extragalactic frames. Stellar spectral classification uses an alphabetical sequence (OBAFGKM, plus L, T, Y for cool dwarfs) with decimal subdivisions. The notation is dense but consistent.

Distance units

The astronomical unit (AU) is the mean Earth-Sun distance, ≈1.496 × 10¹¹ m. Useful within the Solar System; planetary semi-major axes range from 0.4 AU (Mercury) to 30 AU (Neptune), with the Kuiper belt out to ~50 AU. The parsec (pc) is the distance at which 1 AU subtends 1 arcsecond — ≈3.086 × 10¹⁶ m or about 3.26 light-years. Stars are tens of parsecs to kiloparsecs (kpc) away; the Milky Way's disc has a scale length of ~3 kpc and a radius of ~15 kpc; the Sun sits ~8.2 kpc from the Galactic centre. Galaxies are at megaparsecs (Mpc); the Andromeda galaxy is ~0.78 Mpc away, the Virgo cluster ~16 Mpc, and the observable universe extends to ~14 gigaparsecs (Gpc) of comoving radius. Light-years appear in popular writing but parsecs are the working unit; the conversion is 1 pc ≈ 3.26 ly. Within the Solar System, distances are sometimes given in light-minutes or light-hours; the Sun is ~8 light-minutes from Earth.

Magnitudes

Astronomical brightness is reported in magnitudes, a logarithmic scale inherited from Hipparchus and quirky for two reasons: it is inverted (smaller magnitudes are brighter) and the base is 100^(1/5) ≈ 2.512 per magnitude. The apparent magnitude m measures observed flux at Earth (Vega ≈ 0, the Sun ≈ −26.7, Sirius ≈ −1.5, the faintest naked-eye stars ≈ +6, the deepest JWST detections ≈ +30). The absolute magnitude M is what the apparent magnitude would be if the source were placed at 10 parsecs; the distance modulus m − M = 5 log₁₀(d/10 pc) connects the two. Magnitudes come in photometric bands (U, B, V, R, I in the optical; J, H, K in the near-infrared; u, g, r, i, z in the SDSS system; broadband and narrowband filters in many surveys) and the band-to-band differences (colours, e.g. B−V or g−r) carry stellar-type, redshift, and dust-extinction information that drives most photometric inference.

Spectral types and classification

Stars are classified by spectral features into the Harvard sequence O, B, A, F, G, K, M, in order of decreasing surface temperature (~30,000 K for O down to ~3,000 K for M). Each letter is subdivided 0–9 (the Sun is G2). Cool dwarfs and brown dwarfs extend the sequence to L, T, Y. The mnemonic "Oh Be A Fine Girl/Guy, Kiss Me" is universally remembered. The system is descriptive (line strengths in particular bands) but maps cleanly to physical temperature and broadly to mass on the main sequence. Luminosity classes (Roman numerals I through V — supergiants through main sequence dwarfs, with sub-classes) further refine classification. The Morgan-Keenan (MK) system combines spectral type and luminosity class (e.g. the Sun is G2V).

Coordinates and time

The dominant coordinate system is equatorial: right ascension (α or RA, measured eastward in hours, minutes, seconds along the celestial equator) and declination (δ or Dec, measured in degrees north and south of the equator). Equatorial coordinates are tied to a particular epoch (J2000 is universal); the Earth's precession means that fixed stars slowly migrate in α, δ, requiring updates. Galactic coordinates (l, b — longitude and latitude in the Galactic plane and pole) are convenient for studying our Galaxy. Ecliptic coordinates are convenient for Solar-System work. Time uses UTC, TAI (atomic time), TT (terrestrial time), and JD (Julian Date — a continuous day count from 4713 BCE; JD 2,460,000 ≈ February 2023). Time-domain astronomy uses MJD (Modified Julian Date = JD − 2,400,000.5) for compactness.

Practical implications for ML

The unit conventions matter for AI work in several concrete ways. Magnitude differences (colours) rather than absolute magnitudes are the natural feature for many tasks because they cancel distance dependence. Photometric inputs are best fed to networks as either fluxes (in linear units) or magnitudes; mixing the two creates problems. RA/Dec coordinates have wraparound at α = 24h = 0h that must be handled with sin/cos features or at least padded ranges. Galactic-latitude information is essential when training on full-sky data because the Galactic plane has very different population statistics than the high-latitude sky. JD/MJD time stamps appear as features for transient classification; converting to days-since-epoch or phase-folded representations is task-dependent.

The Cosmic Distance Ladder

Astronomy's hierarchy of distance measurement — the cosmic distance ladder — is the calibration backbone of cosmology and the source of the recently sharpened "Hubble tension". Each rung calibrates the next, beginning with geometric parallax for nearby stars and extending through standard candles to the furthest supernovae and the cosmic microwave background. The ladder's structural property — that systematics propagate through rungs — is why modern AI methods, from machine-learning Cepheid classifiers to neural distance estimators, focus on individual rungs in isolation.

Parallax and the geometric foundation

Parallax is the apparent angular shift of a nearby star against the distant background as Earth orbits the Sun. The shift is small — 1 arcsecond at 1 parsec, smaller for more distant stars. Pre-Gaia parallaxes were limited to ~100 pc with reasonable accuracy. The Gaia mission changed this: DR3 (2022) provides parallaxes good to roughly 20 microarcseconds for bright stars and tens of microarcseconds at the catalog faint end (G ≈ 20), giving useful distances out to a few kiloparsecs. Gaia is the modern foundation of the distance ladder; almost every other rung is now calibrated against Gaia parallaxes. DR4, expected in 2026, will further sharpen these measurements with five additional years of observations.

Main-sequence fitting and the cluster ladder

For star clusters too distant for direct parallax, main-sequence fitting uses the cluster's H-R diagram. If a cluster's main sequence is well-defined (which requires knowing which stars belong to the cluster — proper-motion data from Gaia is essential), comparing the cluster's apparent main-sequence brightness to a calibrated absolute main sequence (e.g. for the Hyades or Pleiades, both of which Gaia has nailed down) yields the distance modulus. The Hyades and Pleiades are foundational because Gaia parallax now ties them down at the few-percent level.

Standard candles: Cepheids and RR Lyrae

Cepheid variables are pulsating supergiants with a tight period-luminosity relation: longer-period Cepheids are brighter, with a relation calibrated to a few percent in the near-infrared. Cepheids visible in nearby galaxies (out to ~30–40 Mpc with HST and JWST) calibrate the next rung. RR Lyrae are similar but fainter pulsators in old stellar populations (globular clusters, Galactic halo); they are crucial for the Galactic-scale ladder. The Tip of the Red Giant Branch (TRGB) provides an alternative standard candle based on the maximum luminosity of red giants in old populations; it is increasingly used because it is largely insensitive to dust and metallicity in a way Cepheids are not.

Type Ia supernovae

Type Ia supernovae (SN Ia) are thermonuclear explosions of white dwarfs in close binary systems. They are extraordinarily luminous (peak absolute magnitude ≈ −19.3) and have a tight peak-luminosity-decline relation (the Phillips relation: faster-declining SN Ia are intrinsically fainter), making them effective standard candles after light-curve correction. SN Ia are visible to z ~ 1.5 with HST and JWST; they were the basis of the late-1990s discovery of accelerating cosmic expansion (1998 Riess et al.; 1999 Perlmutter et al.; 2011 Nobel Prize). Modern SN Ia samples — Pantheon+, Union3, the DES SN sample — combine hundreds to thousands of light curves and are central to dark-energy constraints. The light-curve fitting that converts observed photometry to a standardised distance estimate is increasingly done with ML methods.

Baryon acoustic oscillations and the high end

At cosmological scales, the baryon acoustic oscillation (BAO) feature provides a "standard ruler". Pressure waves in the photon-baryon plasma of the early universe froze in at recombination, leaving a characteristic correlation length (~150 Mpc comoving today) imprinted on the galaxy distribution and the cosmic-microwave-background anisotropy. Measuring the BAO scale at different redshifts (BOSS, eBOSS, DESI 2024–2026 results) calibrates distance vs. redshift directly and is independent of the SN Ia ladder.

The Hubble tension

The Hubble tension is the current 4–6σ discrepancy between distance-ladder measurements of H₀ (the local expansion rate, currently giving ≈73 km/s/Mpc from SH0ES) and CMB-based measurements assuming ΛCDM (Planck giving ≈67.4 km/s/Mpc). The tension has resisted explanation since it became sharp around 2018. JWST observations of Cepheids in 2024–2025 reduced but did not eliminate the tension. Whether the tension reflects new physics (early dark energy, modified recombination, evolving dark energy) or unidentified systematics in one of the rungs is an active area; AI methods feature in cross-validation efforts (independent ML-trained Cepheid classifiers, ML-based light-curve standardisation for SN Ia, neural emulators for cosmological likelihoods).

Stellar Evolution and the H-R Diagram

A star is a self-gravitating sphere of plasma in approximate hydrostatic equilibrium, supported by thermal pressure derived from nuclear fusion in its core. Stellar evolution traces the lifetime of a star from formation through nuclear burning to its remnant state — white dwarf, neutron star, or black hole. The Hertzsprung-Russell diagram, plotting luminosity versus surface temperature (or absolute magnitude vs. colour), is the master visualisation; stars cluster on a main sequence, evolve off it, and end as compact remnants whose populations encode the Galaxy's star-formation history.

The Hertzsprung-Russell diagram

The H-R diagram plots luminosity (vertical axis, log-scale) against effective surface temperature (horizontal axis, conventionally inverted so that hot stars are on the left). Most stars lie on the main sequence — a band running diagonally from hot, luminous massive stars at upper-left to cool, faint low-mass stars at lower-right. The main sequence is the locus of hydrogen-burning stars in core fusion equilibrium. Above and to the right of the main sequence are red giants, subgiants, and supergiants — evolved stars whose envelopes have expanded and cooled. Below the main sequence are white dwarfs — the dense, hot remnant cores of low-mass stars. Empirical H-R diagrams of star clusters reveal the cluster's age (the main-sequence turnoff migrates to lower mass with age), while H-R diagrams of all-sky catalogs (Gaia DR3 produced perhaps the most-cited H-R diagram in the literature) show the population structure of the Galaxy.

Main-sequence physics

On the main sequence, stars fuse hydrogen into helium in the core via the proton-proton chain (low-mass stars) or the CNO cycle (higher-mass stars where carbon catalyses fusion). The mass-luminosity relation L ∝ M^α with α ≈ 3.5 over much of the main sequence means a 10-solar-mass star is ~3,000 times more luminous than the Sun and burns its hydrogen ~100× faster despite having ~10× more fuel. Main-sequence lifetimes therefore range from ~10 Myr for O stars to >10¹² yr for the lowest-mass M dwarfs (longer than the age of the universe). The Sun, at 4.6 Gyr old, is roughly halfway through its ~10 Gyr main-sequence lifetime.

Post-main-sequence evolution

When core hydrogen is exhausted, the star ascends the red giant branch: the inert helium core contracts and heats while a hydrogen-burning shell drives envelope expansion. Helium ignition (the helium flash in low-mass stars) starts helium-burning via the triple-alpha process, fusing helium to carbon. The star settles on the horizontal branch. Subsequent shell burning produces the asymptotic giant branch (AGB) phase with strong mass loss. For low-mass stars (<~8 M☉), the AGB ends with envelope ejection (forming a planetary nebula) and a white dwarf remnant. For higher-mass stars, advanced burning stages (carbon, neon, oxygen, silicon) culminate in iron-core collapse and a core-collapse supernova, leaving a neutron star or black hole.

Compact remnants

White dwarfs are Earth-sized objects of solar mass, supported by electron degeneracy pressure. The Chandrasekhar limit (~1.4 M☉) is the maximum mass; above it, electron degeneracy fails and the star collapses. White dwarfs cool slowly over Gyr timescales; the white-dwarf luminosity function is a Galactic chronometer. Neutron stars are ~1.4–2.2 M☉ objects of ~10 km radius, supported by neutron degeneracy and nuclear repulsion, with extreme magnetic fields and rapid rotation (millisecond to second periods); observable as pulsars, magnetars, or X-ray binaries. Black holes are stellar-mass (~5–100 M☉ from core collapse, plus the merger products LIGO has been cataloguing) or supermassive (~10⁵–10¹⁰ M☉ in galactic centres). Neutron-star and stellar-black-hole binary mergers are the gravitational-wave sources discussed in §9.

Nucleosynthesis and the chemical history of the Galaxy

The elements heavier than hydrogen and helium were synthesised in stars: stellar nucleosynthesis built carbon, nitrogen, oxygen, and the iron-peak elements through fusion in massive-star interiors. Neutron-capture nucleosynthesis built the heavier elements: the s-process (slow neutron capture in AGB stars) and the r-process (rapid neutron capture in core-collapse supernovae and neutron-star mergers — the latter spectacularly confirmed by GW170817 in 2017). The chemical-abundance pattern of a star encodes the nucleosynthetic history of the gas it formed from; galactic chemical evolution traces this history across cosmic time and is a major target of large spectroscopic surveys like APOGEE, GALAH, and 4MOST.

Galaxies, Galactic Dynamics, and Large-Scale Structure

Galaxies are gravitationally bound systems containing 10⁷ to 10¹³ stars, plus interstellar gas, dust, and dark matter. They span a morphological sequence from elliptical through spiral and irregular, span a luminosity range of millions, and are organised into groups, clusters, and the cosmic web — a vast filamentary structure on scales of hundreds of megaparsecs. Galactic dynamics, particularly the rotation curves of disk galaxies, were the principal early evidence for dark matter; large-scale-structure observations are now central to cosmological-parameter estimation.

Galaxy morphology and the Hubble sequence

Hubble's 1936 morphological classification — the Hubble sequence or "tuning fork" — divides galaxies into ellipticals (E0 through E7, increasing flattening), spirals (Sa, Sb, Sc, with successively more open arms and prominent bars in SBa, SBb, SBc), lenticulars (S0, intermediate), and irregulars. The classification is descriptive but correlates with physical properties: ellipticals are predominantly old, gas-poor, dispersion-supported; spirals are gas-rich, star-forming, rotation-supported with bulge and disk components. Modern morphological classification (Galaxy Zoo crowdsourcing, then deep-learning classifiers) extends the system but the basic structure remains useful. Galaxies also differ in size (effective radius), colour (red sequence vs. blue cloud — bimodal in colour-magnitude diagrams), and star-formation rate (the "main sequence of star-forming galaxies" relating SFR to stellar mass).

The Milky Way and galactic structure

Our Galaxy is a barred spiral with a disk (~15 kpc radius, ~300 pc thin-disk scale height plus a thicker thick-disk component), a central bulge with a bar, an extended stellar halo (which includes the globular cluster system), and a much larger dark-matter halo (~200 kpc virial radius, ~10¹² M☉). The Sun sits ~8.2 kpc from the Galactic centre, orbiting at ~230 km/s with a period of ~225 Myr. Gaia has revolutionised Galactic dynamics, providing the velocity-space substructure that revealed the Gaia-Enceladus/Sausage merger ~10 Gyr ago and the on-going Sagittarius dwarf tidal stream.

Rotation curves and dark matter

The rotation curve of a spiral galaxy plots circular orbital velocity vs. radius. For a Keplerian system (mass concentrated at the centre), velocity should fall as v ∝ r^(−1/2). Observed rotation curves are flat at large radii — Vera Rubin and Kent Ford's 1970s observations established this in detail — implying an extended mass distribution well beyond the visible stars. This was the principal early evidence for dark matter: an unseen mass component, ~5× the baryonic mass, with a roughly isothermal density profile (ρ ∝ r^(−2) at large radii, falling more steeply at small radii — the Navarro-Frenk-White profile from N-body simulations). Direct detection of dark-matter particles has not yet succeeded; whether dark matter is cold dark matter (CDM, the standard assumption — heavy non-relativistic particles) or has more exotic properties (warm, fuzzy, self-interacting, primordial black holes, axions) is a persistent open question.

Clusters of galaxies

Galaxies are organised into groups (a few galaxies, ~Mpc scale, ~10¹³ M☉ — the Local Group with the Milky Way and Andromeda is one), clusters (hundreds to thousands of galaxies, ~few Mpc scale, ~10¹⁴–10¹⁵ M☉), and superclusters (clusters of clusters). The intracluster medium is hot (~10⁷–10⁸ K) X-ray-emitting plasma that dominates the cluster's baryonic mass. Cluster masses, calibrated via gravitational lensing, X-ray hydrostatic equilibrium, or dynamical methods, are sensitive to the matter density Ω_m and the amplitude of density fluctuations σ₈; cluster cosmology is one of the major routes to cosmological-parameter constraints.

The cosmic web and large-scale structure

On scales above ~10 Mpc the galaxy distribution becomes cosmic web: filaments of galaxies separating voids (largely galaxy-free regions, ~50–100 Mpc across), with clusters at filament intersections. The web reflects the gravitational growth of small density perturbations in the early universe (themselves observable in the CMB anisotropy) into the highly nonlinear structure observed today. Large galaxy redshift surveys (SDSS, BOSS, eBOSS, DESI, Euclid) measure the galaxy two-point correlation function and power spectrum to constrain cosmology; weak gravitational lensing surveys (DES, KiDS, HSC, LSST) measure the matter distribution directly, including the dark component.

Active galactic nuclei

About 10% of galaxies host an active galactic nucleus (AGN) — accretion onto the central supermassive black hole, with luminosity that can exceed the entire stellar luminosity of the host. The AGN zoo includes quasars (high-luminosity, often radio-quiet), Seyferts (lower-luminosity, in nearby spirals), blazars (relativistic jets pointed at Earth), and radio galaxies (kpc-to-Mpc scale jets). The unified model relates these classes to viewing angle. AGN are the most-distant directly-observable objects (quasars detected to z ≈ 7.6, with the Event Horizon Telescope's M87 image and Sgr A* image providing the first direct images of supermassive-black-hole event horizons in 2019 and 2022). AI methods feature in AGN classification, redshift estimation, and variability classification.

Cosmology and the ΛCDM Model

Modern cosmology is built on the ΛCDM model: a spatially flat universe dominated by dark energy (Λ, ~68%) and cold dark matter (~27%) with baryonic matter (~5%), evolving from an inflationary epoch through nucleosynthesis, recombination, and structure formation to today. ΛCDM fits an enormous range of data — CMB anisotropy, large-scale structure, BAO, weak lensing, SN Ia distance moduli — with only six parameters. The ongoing project is to test whether the simplest ΛCDM is in fact the correct theory or whether tensions like the H₀ discrepancy point to richer physics.

The expanding universe

Hubble's 1929 observation that more-distant galaxies recede faster (Hubble's law: v = H₀ d for nearby galaxies, with H₀ the present expansion rate ≈ 67–73 km/s/Mpc depending on method) is the founding observation of cosmology. In general-relativistic terms, the universe expands according to the Friedmann equations, with the scale factor a(t) (defined as a = 1 today) tracking the expansion. The redshift z = (1 − a) / a relates observed wavelength shift to the scale factor at emission. Comoving distance factors out the expansion; luminosity distance and angular-diameter distance are observed quantities related to comoving distance via the expansion history.

The cosmic microwave background

The cosmic microwave background (CMB) is the relic radiation from recombination at z ≈ 1090 (about 380,000 years after the Big Bang), when the universe cooled enough for electrons and protons to combine into neutral hydrogen and the universe became transparent to photons. The CMB is observed today as a 2.725 K blackbody with anisotropies at the 10^(−5) level. The acoustic peaks in the CMB angular power spectrum encode the universe's geometry, baryon density, dark-matter density, and expansion history. Planck (2009–2013, results through 2018) provided the highest-precision CMB measurements to date; ground-based experiments (SPT, ACT, the Simons Observatory, CMB-S4) are now extending to small angular scales and to polarisation. The CMB-derived ΛCDM parameters are the reference standard against which other measurements are compared.

Big Bang nucleosynthesis

In the first few minutes after the Big Bang, with the universe at ~10⁹ K, light nuclei formed in Big Bang nucleosynthesis (BBN). Predicted abundances of deuterium (D/H ~ 2.5 × 10⁻⁵), helium-3, helium-4 (~25% by mass), and lithium-7 are sensitive to the baryon-to-photon ratio. Observed abundances (deuterium in high-redshift quasar absorbers, helium in low-metallicity HII regions) match BBN prediction with the same baryon density inferred from the CMB acoustic peaks — one of the most striking quantitative successes of cosmology. The lithium problem (observed Li-7 below predicted) remains a puzzle.

Inflation

Cosmic inflation is the early-universe phase of accelerated expansion (proposed by Guth in 1981) that solves several otherwise puzzling features of standard cosmology: the horizon problem (why the CMB temperature is uniform across causally-disconnected regions), the flatness problem (why the spatial geometry is so close to flat), and the absence of relic monopoles. Inflation also generates the primordial density perturbations (with a slightly red-tilted scalar power spectrum) and primordial gravitational waves (with amplitude characterised by the tensor-to-scalar ratio r) that seed all subsequent structure. CMB polarisation experiments search for the B-mode signature of primordial gravitational waves; current upper bounds (r < 0.036) constrain the inflationary energy scale.

Dark matter and dark energy

Dark matter is non-baryonic, gravitationally interacting matter making up ~27% of the cosmic energy density; it is required by galactic rotation curves, cluster dynamics, the CMB acoustic peaks, large-scale structure, and gravitational lensing. The standard hypothesis is cold dark matter (CDM): heavy, non-relativistic, weakly-interacting particles. Direct detection (xenon time-projection chambers, cryogenic detectors), indirect detection (gamma-rays, antimatter), and collider production have not produced an unambiguous signal; alternative candidates include axions (now a major experimental focus) and primordial black holes. Dark energy is the form of energy responsible for the accelerated expansion of the universe at low redshift; it makes up ~68% of the cosmic energy density. The simplest form is the cosmological constant Λ — Einstein's much-modified addition — but evolving dark energy with equation of state w(z) is an active possibility, with DESI's 2024 results suggesting a possible w = −1 deviation.

Structure formation and the linear-to-nonlinear transition

The growth of structure under gravity is well-understood while perturbations remain small (linear regime, treated with linear perturbation theory of the Friedmann equations), and is calculated semi-analytically and tested against the CMB and large-scale structure observations. In the nonlinear regime (galaxies, clusters, the cosmic web on small scales), N-body simulations are required: the IllustrisTNG, EAGLE, and FLAMINGO simulations (and many others) provide the simulated counterparts against which observations are compared. Simulation-based inference and ML-trained emulators are increasingly important for connecting these simulations to data efficiently.

Surveys and Instruments

The currently-active survey landscape is the operational substrate for AI-for-astronomy methods. JWST has been delivering science since 2022. Gaia has mapped the Milky Way. The Vera C. Rubin Observatory's LSST began full operations in 2025 and will produce 20 TB of data per night for ten years. Euclid launched in 2023. Roman is targeted for 2027. Each of these has specific data products, alert streams, and inference targets that AI methods are being adapted to. This section summarises the major active instruments and survey campaigns.

James Webb Space Telescope (JWST)

JWST launched in December 2021 and began science operations in mid-2022. It is a 6.5-metre infrared telescope at the Sun-Earth L2 point, with four science instruments (NIRCam, NIRSpec, MIRI, NIRISS) covering 0.6 to 28.5 micron with imaging, spectroscopy, and integral-field-spectrograph modes. JWST has been particularly productive at discovering early-universe galaxies (the JADES, COSMOS-Web, and CEERS surveys have catalogued many galaxies at z > 10 — earlier than expected, generating active discussion about early-universe star and galaxy formation), characterising exoplanet atmospheres (notably WASP-39b and the TRAPPIST-1 system), and resolving stellar populations in nearby galaxies. JWST archive data is publicly available after a one-year proprietary period and increasingly drives ML-based source detection, photometric-redshift estimation, and morphology classification.

Gaia

Gaia is an ESA astrometric mission that operated from 2014 to early 2025 (its science mission ended due to fuel depletion). Gaia produced micro-arcsecond-precision positions, parallaxes, and proper motions for ~1.8 billion stars in the Milky Way; the data is released in successive Data Releases (DR1 2016, DR2 2018, DR3 2022, DR4 expected 2026, DR5 expected 2030). Gaia is the foundation of modern Galactic dynamics, distance-ladder calibration, stellar-population studies, and binary-star statistics; it has revolutionised cluster membership identification, halo substructure mapping, and dwarf-galaxy detection.

Vera C. Rubin Observatory and LSST

The Vera C. Rubin Observatory in Chile, with its 8.4-metre LSST telescope and 3.2-gigapixel camera, began science operations in 2025 with the Legacy Survey of Space and Time (LSST). Over ten years it will image the entire southern sky every few nights in six bands (u, g, r, i, z, y), producing ~20 terabytes of raw data per night, ~10 million transient alerts per night, and ultimately a catalogue of ~20 billion galaxies and ~17 billion stars. LSST's alert stream is the operational target for transient classification: each detection is broadcast as an alert packet within ~60 seconds of imaging, and downstream "broker" systems (ANTARES, ALeRCE, Lasair, Fink) classify the alerts in real time. Photometric classification of LSST alerts is the canonical AI-for-astronomy operational problem.

Euclid and Roman

Euclid (ESA, launched July 2023) is a 1.2-metre space telescope conducting a wide-field optical+near-infrared survey of ~14,000 deg² for cosmology, particularly weak gravitational lensing and galaxy clustering. First Euclid data releases came in 2024–2025 with full survey results expected through 2030. The Nancy Grace Roman Space Telescope (NASA, targeted launch 2027) is a 2.4-metre space telescope (HST-equivalent aperture but with 100× wider field) for wide-field surveys and exoplanet microlensing. Both missions have substantial expected ML-method development for shear measurement, photometric redshifts, and cosmological inference.

Spectroscopic surveys

The current and upcoming spectroscopic survey landscape includes DESI (Dark Energy Spectroscopic Instrument, in operation since 2021, targeting 40 million galaxy and quasar redshifts; DESI's 2024 first-year cosmology results suggested possible evolving dark energy), SDSS-V (the latest in the long-running Sloan series, with multi-object spectroscopy in both hemispheres), 4MOST (a southern multi-object spectrograph beginning 2025–2026), and PFS (Subaru Prime Focus Spectrograph, in commissioning). Spectroscopic data is a critical training and validation source for photometric-redshift estimation.

Radio arrays and high-energy missions

The radio-astronomy landscape includes ALMA (Atacama Large Millimeter Array, sub-mm interferometry), VLA (Very Large Array, cm-wave imaging in New Mexico — currently being upgraded as ngVLA), MeerKAT (in South Africa, en route to the SKA), and the Square Kilometre Array (SKA) under construction in South Africa and Australia; SKA-Low's first phase is operating in 2026. Radio surveys produce enormous volumes of correlator data and depend heavily on automated source extraction and classification. High-energy missions include Chandra (X-ray, since 1999), XRISM (X-ray spectroscopy, launched 2023), Athena (next-generation X-ray, 2030s), and Fermi (gamma-ray, since 2008).

Exoplanets and Their Detection

The exoplanet field went from zero confirmed planets around main-sequence stars in 1994 to over 5,800 confirmed planets in 2025, and the discovery rate continues to accelerate. The methods — radial velocity, transit photometry, microlensing, direct imaging, astrometry — each have characteristic biases that shape what we know about the population. AI methods feature in transit search, false-positive vetting, atmospheric characterisation, and habitability scoring; the field's data complexity (tens of thousands of light curves, complex systematics, mixed planetary and stellar variability) is well-suited to ML.

Radial velocity

The radial-velocity (RV) or Doppler method detects the wobble of a star induced by an orbiting planet, observed as periodic blueshift/redshift in the stellar spectrum. RV amplitude scales as planet-mass × sin(inclination) / orbital-period^(1/3) / star-mass^(2/3), so the method is most-sensitive to massive planets in close orbits. The 1995 discovery of 51 Pegasi b (a hot Jupiter on a 4.2-day orbit) by Mayor and Queloz, the first exoplanet around a main-sequence star (Nobel Prize 2019), used this method. Modern high-precision spectrographs (HARPS, HARPS-N, ESPRESSO, EXPRES) reach 10–30 cm/s precision — sufficient to detect Earth-mass planets in habitable zones around quiet stars in principle, though stellar activity (spots, faculae, oscillations) creates systematic noise floors that ML methods are increasingly used to model and subtract. RV detection yields planet mass × sin(i), orbital period, eccentricity, and the host-star mass — but not radius or atmosphere.

Transit photometry

The transit method detects the dip in a star's brightness when an orbiting planet passes in front. The transit depth (planet radius squared / star radius squared) is small (~1% for a Jupiter, ~0.01% for an Earth), the duration is hours, and the period is the orbital period. The transit method is most-sensitive to large planets with short orbits, and requires the orbital plane to be near the line of sight (the geometric probability of transit is roughly star-radius / orbital-radius). The Kepler mission (2009–2018) revolutionised the field by continuously monitoring ~150,000 stars and detecting thousands of transit candidates; its successor TESS (2018–present) surveys the entire sky in shorter sectors. PLATO (ESA, targeted launch 2026) will extend the method to longer-period orbits. ML methods are central to TESS and Kepler false-positive vetting (eclipsing binaries vs. true planets, stellar variability vs. transits).

Direct imaging

Direct imaging resolves the planet from its host star using high-contrast adaptive optics, coronagraphs, or starlight-suppressing apertures. It is most sensitive to young, hot, wide-orbit planets (still glowing from formation). Notable direct-imaging successes include the HR 8799 system (four resolved planets) and Beta Pictoris b. JWST has produced direct images of substellar companions and is opening characterisation possibilities. Future ground-based extremely-large telescopes (ELT, GMT, TMT) and dedicated space missions (HabWorlds Observatory, NASA's flagship 2040s) target Earth-twin direct imaging.

Microlensing and astrometry

Gravitational microlensing detects planets by the additional brightness peak when a planetary mass perturbs a stellar microlensing event. It is the only method sensitive to distant (kpc-scale), wide-orbit, and free-floating planets. Astrometry detects planets by the wobble of the host star's position (the Gaia DR4 release in 2026 is expected to produce the first substantial astrometric exoplanet catalogue, with thousands of new detections particularly of wider-orbit, longer-period planets that are difficult for RV and transit).

Atmospheric characterisation

Once a transiting exoplanet is detected, transmission spectroscopy (during transit, starlight passes through the planetary atmosphere) and emission spectroscopy (the planet's own thermal emission, observable from the secondary-eclipse depth) reveal atmospheric composition. JWST has been transformative: detection of carbon dioxide, sulfur dioxide, water, and methane in hot-Jupiter and sub-Neptune atmospheres. Atmospheric retrieval is a Bayesian inverse problem (chemistry × temperature-pressure × clouds × stellar activity) that increasingly uses neural posterior estimation in addition to traditional MCMC.

The habitable zone and biosignatures

The habitable zone (HZ) is the orbital range where a rocky planet could in principle support liquid surface water. Its boundaries depend on stellar luminosity, planetary atmosphere, and modelling assumptions; the conservative HZ for Sun-like stars runs ~0.95 to ~1.7 AU. Biosignatures are atmospheric or surface signatures that would indicate life (oxygen + methane disequilibrium being the canonical robust pair, though false-positive scenarios are a substantial concern). The TRAPPIST-1 system (seven Earth-sized planets, three in the HZ around an M dwarf) has been a high-priority JWST target. Whether any HZ exoplanet has detectable biosignatures is unresolved as of 2026; the recent K2-18b dimethyl-sulfide claim has been the object of substantial discussion and is not consensus-confirmed.

Gravitational Waves and Multi-Messenger Astronomy

The first direct detection of gravitational waves (GW150914, 14 September 2015, by LIGO; Nobel Prize 2017) opened a new observational window on the universe. The catalogue of detected events is now well over a hundred and growing; mergers are detected weekly during observing runs. GW astronomy probes physics inaccessible to electromagnetic observation — black-hole and neutron-star mergers, the dynamics of strong gravity, the equation of state of nuclear matter — and combined with electromagnetic and neutrino observations, defines the multi-messenger paradigm. AI methods, particularly deep learning for rapid detection and matched filtering acceleration, are operationally embedded.

Sources and detection principles

A gravitational wave is a propagating perturbation in spacetime curvature, sourced by accelerating masses (in a quadrupolar or higher pattern). Strong sources are compact-binary mergers: two black holes, two neutron stars, or one of each spiralling together over ~10⁸ years and merging in a burst lasting fractions of a second. The waveform during inspiral, merger, and ringdown is calculable from general relativity (with numerical-relativity simulations for the non-perturbative merger phase) and template banks of expected waveforms are matched against detector data. The strain amplitude at Earth from a GW150914-class merger is ~10⁻²¹, requiring kilometre-scale interferometers stabilised at sub-attometer precision.

The detector network

LIGO operates two 4-km-arm interferometers in Hanford, Washington and Livingston, Louisiana; Virgo operates a 3-km interferometer in Italy; KAGRA in Japan joined the network during O3 (the third observing run) with cryogenic operation. The current observing run O4 began in May 2023, with a sensitivity upgrade producing a detection rate of roughly one binary-black-hole merger per week. Most events are binary black hole (BBH) mergers; binary neutron star (BNS) mergers and neutron-star–black-hole (NSBH) mergers are rarer and scientifically richer because they may produce electromagnetic counterparts.

GW170817 and the multi-messenger era

The 17 August 2017 binary-neutron-star merger GW170817 was a landmark event: gravitational waves were detected by LIGO/Virgo, a short gamma-ray burst was detected 1.7 seconds later by Fermi-GBM and INTEGRAL, and the optical/UV/IR kilonova AT2017gfo was localised within hours by ground-based follow-up to the host galaxy NGC 4993 at 40 Mpc. The kilonova spectrum showed the signature of r-process nucleosynthesis, confirming neutron-star mergers as a major source of heavy elements (gold, platinum, lanthanides). GW170817 also provided a standard-siren measurement of the Hubble constant. The event remains the canonical multi-messenger detection; substantial AI-driven follow-up infrastructure (rapid alert networks, ML-driven candidate vetting, sky-localisation-aware tiling strategies) was developed in its aftermath.

Pulsar timing arrays and the nanohertz GW background

Pulsar timing arrays (PTAs — NANOGrav, EPTA, PPTA, the Indian PTA, and the international IPTA combination) use precise timing of millisecond pulsars to detect gravitational waves at nanohertz frequencies, sourced by supermassive-black-hole binary inspirals across cosmological volume. NANOGrav's June 2023 announcement of evidence for a stochastic gravitational-wave background at the ~3σ level was the major mid-decade GW event; the signal is consistent with the expected supermassive-binary background, with continuing analyses tightening the constraints.

Future GW observatories

Planned next-generation ground-based detectors (Einstein Telescope, Cosmic Explorer, both 2030s+) will reach an order of magnitude more sensitivity than current detectors. Space-based LISA (Laser Interferometer Space Antenna, ESA-led, targeted 2035 launch) will detect lower-frequency waves (mHz) from supermassive-black-hole mergers across cosmological distances and from the inspiral phase of stellar-mass binaries decades before merger. The science forecasts include strong tests of general relativity in the strong-field regime, binary-black-hole population synthesis at extragalactic scale, and the detailed merger history of supermassive black holes.

AI methods in GW analysis

GW data analysis depends on matched-filter searches against template banks, Bayesian parameter estimation, and electromagnetic-followup coordination — all of which now use machine learning extensively. Convolutional networks for rapid burst detection, normalising flows for waveform-parameter posterior estimation (replacing many-CPU-day MCMC with sub-second neural inference), and neural classifiers for glitch identification (transient detector artefacts that mimic real signals) are all in production use. As event rates grow with detector improvements, ML-driven analysis is increasingly the only way to keep up.

From Astronomy to ML: An Orientation

The previous nine sections established the astronomical vocabulary: units and coordinates, the cosmic distance ladder and the Hubble tension, stellar evolution, galactic dynamics and large-scale structure, ΛCDM cosmology and the CMB, the major surveys (JWST, Gaia, LSST/Rubin, Euclid, Roman, DESI), exoplanet methods, and gravitational-wave astronomy. This section is the bridge to the methodology that follows. Astronomy was an early adopter of machine learning, and the operational case has only grown stronger — survey data volumes exceed traditional analysis pipelines, rare-event scarcity creates field-specific challenges, the simulation-reality gap shapes inference methodology, and real-time alert pipelines impose hard latency constraints. This section orients the ML practitioner; Sections 11–19 develop the methods within that frame.

What separates AI for astronomy from generic image and time-series ML

Most architectures used in astronomy ML — CNNs, transformers, GNNs, normalising flows, diffusion models — are the same ones used in mainstream computer vision and ML. What makes the field distinctive is the data structure and operational context. Survey images carry well-characterised noise (photon shot noise, atmospheric turbulence systematics, charge-coupled-device artefacts, cosmic-ray hits) and instrument-specific PSFs that must be respected by methods. Light curves are sparse, irregularly sampled, multi-band, and contaminated by stellar variability and instrument systematics. Spectra carry redshift-dependent telluric absorption, sky-line residuals, and flux-calibration uncertainties. Catalogues encode object cross-matching, completeness functions, and selection effects that are essential to inference. The methodology of this chapter is the systematic adaptation of standard ML to these astronomical data realities.

What astronomy-AI demands of ML practice

Several methodological demands recur across the chapter. Calibrated uncertainties are essential: photo-z estimates feed into cosmological inference and a biased posterior corrupts the science. Out-of-distribution robustness is central: training on simulated catalogues or on one survey and deploying on another exposes models to systematic shifts that point estimators handle poorly. Real-time latency is a hard constraint for transient and gravitational-wave alert pipelines: inference must run in seconds. Class-imbalance handling is universal: kilonovae, strong lenses, and exoplanet-transit candidates are extremely rare in the survey population. Cross-survey transfer is a standing requirement: a model trained on Pan-STARRS data must generalise to LSST without expensive retraining. The methodologies of this chapter are partly the systematic engineering response to these demands.

The downstream view

The AI-for-astronomy pipeline today looks like this: telescopes produce images and time-series; image-differencing and source-extraction pipelines produce alerts; brokers (ANTARES, ALeRCE, Lasair, Fink, AMPEL) ingest alerts and apply ML classifiers; high-priority candidates are sent to spectroscopic-follow-up planners; in parallel, accumulated photometric data feeds catalogue-level inference (photo-z, morphology, structural parameters); cosmological inference takes calibrated catalogues and either applies SBI against simulation grids or feeds traditional likelihood-based MCMC. ML methods appear at every stage. The remainder of this chapter develops each piece: §2 the data substrate, §3 transient detection, §4 photo-z, §5 galaxy imaging, §6 exoplanets, §7 gravitational waves, §8 cosmological SBI, §9 anomaly and foundation models, §10 the operational frontier.

The Astronomical Data Substrate for ML

Before discussing methods, it is worth understanding the data substrate concretely: what the inputs look like, what the noise structure is, what the labels actually mean, and what cross-validation means in this domain. The substrate is image data (most of the volume), time-series light curves (the basis of transient and exoplanet science), spectra (the basis of redshift, stellar parameters, and atmospheric retrieval), catalogues (the joined products), and alert streams (the real-time pipeline output).

Images

Astronomical images are pixel arrays in FITS format (Flexible Image Transport System — the field's universal file format), typically with 16-bit or 32-bit pixel depth and accompanying metadata describing the World Coordinate System (WCS) mapping pixel to sky position, the date and time of observation, the filter band, the exposure time, and the calibration reference. Standard preprocessing steps include bias subtraction, dark-current removal, flat-fielding (correcting for pixel-to-pixel sensitivity variation), and cosmic-ray rejection. The point spread function (PSF) — the response of the instrument to a point source — is band- and position-dependent, varies with atmospheric seeing for ground-based observations, and is essential for both source detection and photometry. ML methods for image-based inference (galaxy morphology, strong-lens detection, weak-lensing shear measurement, image deblending) operate on these calibrated images, often after cutout extraction around individual sources.

Light curves

A light curve is a time series of brightness measurements for a single object, typically across multiple photometric bands. Light curves are usually sparse (a survey revisits a given field every few days, not continuously) and irregular (visit cadence depends on weather, telescope scheduling, and survey strategy). Each measurement carries a magnitude (or flux) value, an uncertainty, a band identifier, and an MJD timestamp. Light curves come from time-domain surveys (ZTF, ATLAS, ASAS-SN, LSST) for transients and variables, from dedicated exoplanet missions (Kepler, K2, TESS, PLATO) for transit detection, and from individual-target follow-up programmes. ML inputs for light curves typically use either tabular features (mean magnitude, period, amplitude, kurtosis, skew, number of detections) extracted by feature-engineering pipelines (Feets, scikit-feature-extraction libraries) or directly the multi-band time series fed into RNN or transformer architectures.

Spectra

A spectrum is a flux measurement as a function of wavelength, typically with thousands of wavelength bins (the resolution R = λ/Δλ runs from ~1,000 for low-resolution surveys to ~50,000 for high-precision exoplanet RV work). Spectra reveal absorption and emission features that identify chemical species and physical conditions, encode redshift through wavelength shifts of recognisable features, and are the gold-standard targets for photo-z estimation training. Spectroscopic surveys (DESI, SDSS, GAMA, the various 4MOST programmes, JWST/NIRSpec) produce hundreds of millions of spectra. Preprocessing includes flux calibration, telluric correction, sky subtraction, and continuum normalisation. ML methods on spectra include redshift estimation, stellar-parameter regression, and increasingly transformer-based foundation-model embedding.

Catalogues

A catalogue is a tabular product describing detected sources in a survey, with columns for position (RA, Dec), photometric measurements across all bands, derived parameters (stellar effective temperature, gravity, metallicity; galaxy morphology indicators; redshift estimates), and association IDs that link the catalogue to images and spectra. Major catalogues — Gaia DR3, SDSS DR18, the Pan-STARRS catalogue, the LSST DPDD catalogue, the JWST archive catalogues — are the primary working products that ML pipelines consume. Cross-matching between catalogues (joining a Gaia entry with an SDSS entry with a 2MASS entry) is non-trivial because of position uncertainties, multi-component sources, and proper motion; ML methods including learned embeddings are increasingly used.

Alert streams

An alert stream is a real-time broadcast of detected changes in the sky: each new image is differenced against a reference, sources whose flux has changed beyond a detection threshold are packaged as alert packets, and the packets are streamed via a protocol (Apache Kafka in the Rubin/LSST era) to downstream consumers. An LSST alert packet contains the differenced-image cutout, the recent light-curve history, photometric measurements, and metadata. Brokers (ALeRCE, ANTARES, Lasair, Fink, AMPEL) consume alert streams, apply ML classifiers, and surface candidates to scientists. Alert-stream throughput is the operational tightest constraint: LSST will produce ~10 million alerts per night, and broker classifiers must process them within minutes.

Labels and ground truth

The label-availability landscape varies substantially across tasks. Spectroscopic redshifts are the gold-standard label for photo-z training but are expensive (a few minutes of large-telescope time per target) and selection-biased toward bright sources. Transient classifications require spectroscopic follow-up that is increasingly the bottleneck; classifier training relies heavily on simulated light curves (e.g. SNANA-simulated supernovae). Galaxy morphology labels come from Galaxy Zoo crowdsourcing (now extended to deep-learning-aided labelling). GW signal labels are simulated waveforms; only a small subset of detected events have been confirmed astrophysical signals. The mismatch between abundant simulated labels and scarce real labels is a recurring methodological issue.

Transient Detection and Photometric Classification

Transient detection — finding objects whose brightness has changed — and photometric classification — assigning event types to detections — are the canonical AI-for-astronomy operational problems. The pipeline runs in real time: image differencing identifies candidate sources, ML classifiers triage them, and the highest-priority candidates trigger spectroscopic follow-up before the event fades. The methodology has matured through several generations and now reliably classifies major transient types (Type Ia/II/Ibc supernovae, tidal disruption events, kilonovae, microlensing) at survey scale.

Image differencing and the bogus-vs-real problem

The detection pipeline begins with image differencing: subtracting a deep reference image of a field from a newly-acquired image, leaving only sources whose flux has changed. Difference images contain real sources (true transients and variables) and bogus sources (cosmic rays, image registration artefacts, satellite trails, hot pixels, residual PSF mismatches, edge effects) at roughly 1:100 ratios. The first ML task is the real-bogus classifier: a binary CNN that takes a small image cutout (typically 21x21 pixels) and outputs a real-bogus probability. The earliest deployed real-bogus classifiers used random forests on engineered features; modern deployments use small CNNs. Performance ceilings (~95% real recall at <1% bogus contamination) have substantially improved alert-stream tractability.

Photometric classification of transients

Once a source is real, the next task is photometric classification: from the multi-band, sparse, irregularly-sampled light curve, predict the transient type. The 2018 PLAsTiCC Kaggle challenge (Photometric LSST Astronomical Time-series Classification Challenge) standardised the problem: 14 transient classes, ~3.5 million simulated light curves, evaluation by weighted log loss. Winning entries combined gradient-boosted trees on engineered features (Boone 2019, Ishida et al. 2019) with deep learning on raw light curves. Subsequent work has consolidated around two architectural families: RNN-based approaches (RAPID, SuperRAENN) that handle the irregular sampling natively, and transformer-based approaches (the various PELICAN, LightCurve-Transformer, and Astromer designs) that have generally outperformed RNNs in 2023–2025.

Operational deployment in brokers

Alert brokers run photometric classifiers in production. ALeRCE (Chile) classifies ZTF alerts in real time using a hierarchical classifier (early classifier on the first detection, light-curve classifier for accumulated detections). ANTARES (NOIRLab) uses a similar two-stage approach. Lasair (Edinburgh) combines ML classification with real-time cross-matching to context catalogues. Fink (LAL Orsay) emphasises the kilonova-search use case. Latency budgets are tight: from image arriving at the broker to classifier output is targeted at <30 seconds. Production systems use simplified architectures (smaller transformers, ensembled gradient-boosted trees) to meet the latency constraint.

The early-classification problem

For science cases that demand spectroscopic follow-up before the event peak (kilonovae fading on hour timescales, tidal disruption events with characteristic early colours), early-classification ML — predicting the eventual type from the first one to three detections — is increasingly important. The 2023–2025 ELASTICC challenge extended PLAsTiCC to the early-classification regime. State-of-the-art classifiers reach reasonable accuracy on the first 5–10 days of light curve, but rare classes (kilonovae, fast blue optical transients) remain difficult.

Variable star classification

Beyond transients, the same ML machinery handles variable star classification: stars whose brightness varies periodically (Cepheids, RR Lyrae, eclipsing binaries, Mira variables, delta Scuti) or aperiodically (T Tauri stars, cataclysmic variables, AGN). Period extraction (Lomb-Scargle, conditional entropy, deep-learning-based period regressors) is a preprocessing step; classification is then a multi-class problem on the period-folded light curve plus colour features. The OGLE, Kepler, ZTF, and Gaia variable-star catalogues are the major training resources. AI methods are now central to keeping pace with the production rate of new variable-star detections.

Evaluation realities

Evaluation in this domain is non-trivial. Class imbalance (Type Ia supernovae outnumber kilonovae by 10⁵:1) makes accuracy useless; weighted log loss (PLAsTiCC's metric) is the standard but encodes specific class-importance assumptions. Per-class purity at fixed completeness (e.g., what fraction of "predicted-Ia" candidates are actually Ia, given that the classifier finds 90% of true Ia) is the operationally relevant metric. Out-of-distribution behaviour matters: classifiers trained on simulated light curves often fail on real data with subtly different cadence, noise, or systematic structure; cross-validation strategies (training on one survey and validating on another) are now standard practice.

Photometric Redshift Estimation

A spectroscopic redshift requires minutes to hours of large-telescope time per target; a photometric redshift is computable from broad-band photometry alone, in milliseconds, for hundreds of millions of sources. The trade-off is precision and bias: photo-z estimates are noisier than spectro-z, can have catastrophic outliers, and inherit the calibration errors of the photometry. Photo-z is essential for any cosmological survey that targets billions of objects (LSST, Euclid, Roman, SDSS-V) and is the canonical example of an ML-driven inference pipeline that feeds directly into precision cosmology.

The physical basis and template fitting

Galaxies have characteristic spectral features (the 4000 Å break, emission lines from star-forming regions, the Lyman break for high-redshift sources) that shift with redshift, and broad-band photometry samples the spectral energy distribution at multiple wavelengths. Given a galaxy's photometry in u, g, r, i, z, y bands and a library of galaxy spectral templates, the redshift can be inferred by finding the template-redshift combination that best matches the observed colours. This is template fitting (BPZ, EAZY, LePHARE), the pre-ML standard. Template fitting is interpretable, handles photometric uncertainties properly, and produces redshift posteriors, but it depends on the quality of the template library and is computationally expensive at survey scale.

Machine-learning approaches

ML photo-z replaces the template library with training on a sample of galaxies that have both photometry and spectroscopic redshifts. The methodological progression has been: k-NN and random-forest regression (the early Random Forest photo-z methods of Carrasco-Kind and Brunner 2013); neural networks for point estimates (ANNz, ANNz2); mixture density networks (MDNs) for posterior estimation (Bishop 1994 architecture, applied to photo-z by D'Isanto and Polsterer 2018, refined extensively); and normalising flows (the FlexZBoost and the various conditional-flow architectures of 2023–2025) that produce flexible, well-calibrated posteriors. The current state of the art combines flexible posterior estimation with careful handling of the training-vs-application sample mismatch.

Calibration and the cosmological imperative

For cosmology, individual photo-z point estimates are less important than the redshift distribution n(z) of a galaxy sample. A small mean-redshift bias in n(z) produces a substantial bias in inferred cosmological parameters (Ω_m, σ_8, w₀) — the cosmology imperative requires calibrated redshift posteriors. Photo-z calibration techniques include cross-correlation calibration (using the angular cross-correlation of a photometric sample with a spectroscopic sample to recover n(z)), self-organising-map redshift calibration (the Buchs et al. 2019 method, used by DES), and simulation-based calibration (training on hydrodynamic simulations with known truth). The DES Y3 photo-z calibration paper (Myles et al. 2021) is the methodology benchmark.

Out-of-distribution challenges

The training-vs-application mismatch is particularly acute for photo-z. Training samples are drawn from spectroscopic-redshift catalogues, which are biased toward bright, blue, low-redshift galaxies. Application samples (LSST, Euclid) extend to fainter, redder, higher-redshift sources where the training data is sparse. Coverage (does training data span the application photometry?) and label-shift handling are central concerns. The various 2024–2026 LSST photo-z challenges (RAIL, the Rubin DESC photo-z working group's benchmark suite) explicitly score methods on OOD robustness.

Photometric stellar parameters

The same methodology applies to stellar work: from broad-band photometry plus Gaia parallax, predict stellar parameters (effective temperature, surface gravity, metallicity, alpha-element abundance) without needing spectroscopy. Stellar-parameter ML pipelines feed Galactic-archaeology studies and are increasingly important as Gaia DR3 catalogues run into the tens of millions of stars per square degree. The Galactic-archaeology spectroscopic surveys (APOGEE, GALAH) provide training labels.

Practical deployment patterns

Photo-z deployment in surveys typically involves: a primary ML estimator (trained, validated, with posterior calibration), one or more cross-check methods (template-fitting in parallel for systematics tracking), an n(z) calibration procedure (cross-correlation, SOM, or simulation-based), and a regular reprocessing schedule that updates redshift estimates as new data is acquired. Photo-z catalogues are typically released alongside primary survey catalogues; quality flags (e.g. redshift posterior width, OOD-detection score) are essential downstream filters.

Galaxy Morphology and Image-Based Inference

Galaxy images carry information about morphology (spiral, elliptical, irregular), size, structural parameters (Sérsic profile, bulge-disk decomposition), and gravitational-lensing-induced shape distortions. Three image-based inference tasks have become canonical AI applications: galaxy morphology classification, strong-lens detection, and weak-lensing shear measurement. Each illustrates a different methodological structure.

Galaxy morphology classification

The Galaxy Zoo crowdsourcing project (Lintott et al. 2008) labelled tens of thousands of SDSS galaxies into the Hubble morphological sequence using volunteer classifications. The labels became training data for the first generation of CNN-based morphology classifiers (Dieleman, Willett, Dambre 2015), which reached human-level performance. Subsequent work has scaled to billion-galaxy LSST-class datasets (Galaxy Zoo DECaLS, Walmsley et al. 2022), shifted to multi-task architectures (predicting multiple Galaxy-Zoo question responses jointly), and increasingly used self-supervised pretraining to reduce label requirements. Morphology classifiers are now standard production tools in survey pipelines, feeding galaxy-evolution studies, AGN-host identification, and merger rates.

Strong-lens detection

A strong gravitational lens is a foreground galaxy or cluster whose mass distorts the image of a background source into multiple images, arcs, or rings. Strong lenses are scientifically extremely valuable (probing dark-matter substructure, providing time-delay cosmography, enabling magnified high-redshift studies) but rare — perhaps 10⁻⁵ of galaxies. Pre-ML detection used visual inspection or spectroscopic surveys; ML changed the field. The 2017 Strong Gravitational Lens Finding Challenge (Metcalf et al.) established the benchmarks; CNN-based detection at scale (Petrillo et al. 2017, Jacobs et al. 2019, the various 2020–2024 deep-learning surveys) has produced thousands of strong-lens candidates. The major operational challenge is the per-image false-positive rate: detecting one true lens in 10⁵ candidates means even a 99% false-positive-rejection model produces 10× as many false-positive candidates as true lenses, requiring expert visual inspection of all candidates.

Weak gravitational lensing and shear measurement

Weak gravitational lensing is the small (typically ~1%) shape distortion of background galaxies by intervening matter. Statistical analysis of these distortions across millions of galaxies maps the matter distribution, including the dark-matter component, with high precision; weak lensing is one of the major routes to cosmological-parameter constraints. The methodological challenge is accurately measuring galaxy shear (the small added ellipticity due to lensing) in the presence of much larger intrinsic ellipticity, PSF distortion, and pixel noise. Pre-ML methods (KSB, lensfit, the various model-fitting techniques) reached percent-level shear accuracy. ML approaches — first metacalibration enhancements, then end-to-end deep-learning shear measurement (Ribli et al. 2019, the Euclid-specific SHE pipeline, the LSST DESC shear-measurement working group efforts) — are now reaching the requirements for Stage IV cosmology surveys.

Image deblending and source extraction

In deep imaging, sources frequently overlap (blending) — particularly in crowded fields. Image deblending separates the flux from overlapping sources; this is essential for accurate photometry and is non-trivial when source profiles are similar. ML deblending (the various BlendingToolKit-based methods, the 2024–2026 transformer-based deblenders) has improved over classical methods (which struggle when sources are very close). LSST Data Management's source-extraction pipeline includes ML deblending as a production component.

Foundation models for galaxy images

The Multimodal Universe project, AstroCLIP (Parker et al. 2024), and the various 2024–2026 follow-ups use contrastive pretraining on multi-modal galaxy data (images, spectra, structural parameters) to learn general-purpose galaxy embeddings. The empirical result is that a pretrained galaxy-foundation model fine-tuned on small downstream task labels often outperforms task-specific architectures trained from scratch. This is becoming the new methodological default for galaxy ML.

Exoplanet Detection and Characterisation

Exoplanet science is a transit-detection-and-vetting pipeline at scale: monitor hundreds of thousands of stars, identify periodic flux dips, vet false positives (eclipsing binaries, stellar variability, instrument systematics), and characterise the resulting confirmed planets. ML methods have moved from optional accelerators to operational infrastructure; the Kepler and TESS pipelines now embed deep-learning classifiers as primary vetting components.

Transit search and the box-least-squares baseline

Transit search identifies periodic flux dips in a light curve. The classical method is the Box Least Squares (BLS) algorithm: scan over period and transit duration, fit a box-shaped dip, return the best signal-to-noise candidates. BLS is fast, well-calibrated, and remains the workhorse for initial signal detection. ML enters at the next stage — vetting — where the candidate set is contaminated by eclipsing binaries (deep, U-shaped eclipses), stellar pulsations (sinusoidal variability), instrument-artefact periodics (Kepler's "season-folded" systematics, TESS's momentum-dump periodics), and cross-talk from other sources.

The Kepler/TESS deep-learning vetters

Shallue and Vanderburg (2018) introduced the AstroNet architecture: a 1D CNN that classifies Kepler vetting candidates into planet-vs-false-positive using both phase-folded and global views of the light curve. AstroNet found two new planets (Kepler-90i, Kepler-80g) in re-examined Kepler data that previous vetting had missed, demonstrating the operational case. Subsequent architectures (Yu et al. 2019's ExoMiner, Valizadegan et al. 2022's ExoMiner-Pro, the TESS-specific Astronet-Triceratops) refined the methodology and are now embedded in the Kepler and TESS DV (Data Validation) pipelines.

Atmospheric retrieval

Once a transiting exoplanet is confirmed, transmission spectroscopy (during transit, the host star's light passes through the planet's atmosphere) and emission spectroscopy (the planet's own thermal emission, observable during secondary eclipse) reveal atmospheric composition. Atmospheric retrieval is the inverse problem: from the observed spectrum, infer the temperature-pressure structure, chemical abundances, and cloud properties. Classical retrieval used MCMC over a parameterised forward model (Madhusudhan and Seager 2009 lineage). The forward model is expensive (radiative transfer through layered atmospheres), making MCMC slow. Neural posterior estimation (NPE) — training normalising flows on simulated spectra to approximate the posterior — has accelerated retrieval by 3–4 orders of magnitude (Yip et al. 2021, Vasist et al. 2023, the Ariel-mission preparation work). This methodology is now the default for JWST-era exoplanet atmospheric work.

Stellar-activity disentanglement

RV exoplanet detection requires precise stellar-activity removal because spots, faculae, and oscillations produce RV-amplitude variations that mimic or mask planetary signals. Gaussian-process regression on the activity component (combined with Keplerian models for the planet) is the established framework. Recent ML work uses neural-network parameterisations of stellar-activity time series, joint training on RV and contemporaneous photometry, and convolutional approaches that learn activity signatures across stellar-types. The 2024 EPRV (Extreme Precision RV) Working Group report identified ML-driven stellar-activity handling as central to detecting Earth-twin RV signals.

Direct imaging and high-contrast post-processing

Direct-imaging detection of exoplanets requires extreme starlight suppression. Post-processing pipelines (KLIP, ANDROMEDA, the various PCA-based methods) subtract stellar PSF residuals from coronagraph images to reveal planetary point sources. ML methods (the various AutoEncoder-based post-processing pipelines, GAN-based PSF-residual modelling, transformer-based approaches under development) are progressively replacing classical PCA. Performance is measured in terms of the contrast curve: the planet-to-star flux-ratio detectable as a function of separation from the star.

Habitability scoring and biosignature interpretation

For confirmed habitable-zone exoplanets, AI methods feature in habitability scoring (multivariate models predicting habitability indices from stellar and planetary parameters) and biosignature interpretation (Bayesian frameworks that combine multiple potential biosignatures with abiotic-false-positive scenarios). The K2-18b dimethyl-sulfide claim of 2025 prompted substantial methodological work on SBI-based atmospheric inference and false-positive accounting; the eventual operational frameworks for biosignature claims will substantially incorporate ML methods.

Gravitational-Wave Detection and Parameter Estimation

Gravitational-wave data analysis depends on three operational tasks: rapid detection of candidate signals, parameter estimation for confirmed events, and rejection of detector glitches. All three increasingly use machine learning: deep-learning detectors run alongside matched filtering, normalising-flow posterior estimators have replaced multi-CPU-day MCMC, and ML classifiers identify the various glitch families that contaminate detector data.

Matched filtering and the deep-learning alternative

The classical GW-detection pipeline is matched filtering: cross-correlate the detector data with a bank of pre-computed waveform templates (millions of templates, spanning the binary-black-hole and binary-neutron-star parameter space), and detect candidates above a signal-to-noise threshold. Matched filtering is computationally expensive (template-bank construction is heavy, and online matched filtering uses substantial CPU resources) and depends on accurate waveform modelling. Deep-learning detection (Gabbard et al. 2018, George and Huerta 2018, the various 2020–2025 follow-ups) trains CNNs on simulated GW signals plus noise to detect signals; the methodology can be 10–100× faster than matched filtering and handles cases where templates are uncertain (eccentric binaries, precessing systems). Production deployment is increasingly hybrid: ML for first-pass detection, matched filtering for high-confidence candidates.

Parameter estimation with normalising flows

Once a GW event is detected, parameter estimation infers the source properties: component masses, spins, distance, sky location, inclination. Classical PE uses MCMC or nested sampling on the GW likelihood, which can take days of CPU time per event. DINGO (Dax et al. 2021) introduced normalising-flow posterior estimation for GW parameter estimation: train a conditional normalising flow on simulated waveforms plus noise to approximate p(θ | strain), and apply at inference time in seconds. Subsequent work (DINGO-IS, DINGO-BNS, the LIGO/Virgo pipeline integrations) has made normalising-flow PE production-ready. The latency improvement (CPU-days to seconds) is operationally transformative for multi-messenger follow-up: the sky-localisation posterior is now available rapidly enough for electromagnetic-followup teams to slew telescopes during the event.

Glitch classification

GW detectors are subject to glitches — non-Gaussian transient noise events that mimic real signals. Glitches come from many sources: scattered light, suspension resonances, cosmic-ray hits, anthropogenic noise, and various unidentified populations. The Gravity Spy project (Zevin et al. 2017) crowdsourced glitch classification and trained CNN classifiers; the system runs in production at LIGO. Glitch classification feeds directly into detector characterisation and signal-veto pipelines, reducing the false-positive rate for GW candidates.

Multi-messenger triggers

For multi-messenger GW events (binary-neutron-star mergers in particular), the operational pipeline is: GW detection → rapid sky-localisation posterior → alert broadcast → electromagnetic follow-up (galaxy-targeted or wide-field tiling) → counterpart identification → joint analysis. ML enters at every stage: detection, localisation, alert prioritisation, candidate-counterpart vetting (the kilonova-vs-supernova classification problem in the EM data is itself an ML task). The GW170817 follow-up in 2017 prompted substantial infrastructure development; the resulting tooling (the GW-counterpart frameworks at the various survey brokers) is now mature.

Population synthesis and inference

With a catalogue of 100+ detected mergers, population inference — what is the underlying distribution of binary-black-hole masses, spins, formation channels — is increasingly important. Hierarchical Bayesian inference combines per-event posteriors with selection-function modelling. Neural posterior estimation has been adapted to the population setting (the Mould et al. 2022 work, subsequent extensions). The result is a steadily-improving constraint on stellar-evolution models, dynamical-formation channels, and the contribution of various binary-formation pathways to the merger rate.

The LISA frontier

The space-based LISA mission (targeted 2035 launch) will detect gravitational waves at millihertz frequencies, including supermassive-black-hole mergers across cosmological distances and stellar-mass binaries inspiraling for years before merger. The data-analysis challenge is qualitatively different from ground-based GW: simultaneous-source detection (potentially hundreds of overlapping signals), much longer signal durations, and a substantially-different noise budget. The LISA Data Challenge series has been a methodological forcing function; ML methods (transformers for long-time-series, normalising flows for joint posterior estimation, contrastive pretraining for glitch identification) are central to the developing analysis stack.

Simulation-Based Inference for Cosmology

Cosmological inference traditionally used likelihood-based MCMC over analytic or semi-analytic theory predictions. As theory predictions become simulation-based — N-body simulations, hydrodynamic simulations, lensing maps with parameters varying across the simulation grid — the likelihood becomes intractable, and simulation-based inference (SBI, also called likelihood-free inference) becomes essential. SBI is now the methodological frontier of computational cosmology, and one of the major areas where modern AI tools have transformed scientific practice.

The likelihood-free framing

Suppose we have a forward simulator that, given cosmological parameters θ (Ω_m, σ_8, h, w, ...), produces simulated observables x (galaxy power spectra, weak-lensing maps, cluster counts). The Bayesian inference problem is p(θ | x_obs) ∝ p(x_obs | θ) p(θ). Classical inference requires an explicit likelihood p(x | θ), which is intractable when the forward model is a multi-Gpc N-body simulation with subgrid baryonic physics. Simulation-based inference sidesteps the explicit likelihood by training neural networks on simulator-output samples to approximate the posterior, the likelihood, or the likelihood ratio directly.

Neural posterior estimation

Neural posterior estimation (NPE) trains a conditional density estimator (typically a normalising flow) on (θ, x) pairs from the simulator: at inference time, conditioning on x_obs gives an approximation to p(θ | x_obs). NPE is amortised: one trained network handles many real datasets without retraining. The methodology is now mature, with widely-used libraries (sbi, lampe, swyft) providing the implementation. Cosmological NPE applications include cosmological-parameter inference from weak-lensing maps (Jeffrey et al. 2024, the various 2024–2026 LSST-DESC follow-ups), the SDSS BOSS galaxy power-spectrum analysis, and 21-cm cosmology.

Neural likelihood and likelihood-ratio estimation

Alternatives to NPE include neural likelihood estimation (training a network on log p(x | θ)) and neural ratio estimation (training a binary classifier to distinguish samples from p(x, θ) vs. p(x) p(θ), giving the likelihood ratio). The choice of method depends on the problem geometry; ratio estimation has the advantage of handling high-dimensional θ better, while NPE is simpler when θ is low-dimensional. The SBI Toolkit from the Macke group has been a central reference implementation.

The CAMELS project and simulation-grid training data

The CAMELS project (Cosmology and Astrophysics with MachinE Learning Simulations, Villaescusa-Navarro et al. 2021) provides ~5,000 cosmological hydrodynamic simulations spanning a six-dimensional parameter grid (Ω_m, σ_8, plus four feedback parameters). CAMELS is the de-facto training set for cosmological SBI involving baryonic physics. The CAMELS-Multifield project extends this to multiple observable channels (gas density, dark matter, stars, hot gas), enabling joint SBI across modalities.

Robustness, calibration, and validation

SBI inherits the simulation-reality gap: if the simulator is wrong, the posterior is wrong. Robustness testing in SBI involves cross-validation across simulation suites (training on IllustrisTNG, validating on EAGLE), simulation-based calibration tests (the Talts et al. 2018 SBC procedure for testing posterior calibration), and "leave-out-suite" validation. The 2024–2026 LSST-DESC SBI working group has developed a working set of validation practices that will likely become field standards.

Beyond cosmological parameters

SBI is increasingly applied to broader astrophysical inference: stellar-population synthesis (inferring star-formation histories from spectroscopy), dwarf-galaxy substructure detection (inferring the dark-matter substructure that produces gravitational-lensing perturbations), and exoplanet-population inference. The methodology is general; the field-specific work is in tailoring summary statistics, simulation grids, and validation procedures to each application.

Anomaly Detection and Foundation Models

Astronomy's discovery space — finding objects unlike anything in catalogues — is well-suited to anomaly-detection methods. And the field's fragmented data substrate (multiple surveys with overlapping but non-identical observation modalities) is well-suited to multimodal foundation models that learn shared representations across observation types. These two methodological themes are increasingly merging: foundation models trained on multimodal data provide the embeddings against which anomaly scoring operates.

Anomaly-detection methodology

Astronomical anomaly detection has used a wide methodological palette: isolation forests on engineered features, autoencoder reconstruction error on light curves and images, density-estimation methods (one-class SVM, normalising flows on the data manifold), and increasingly contrastive embedding approaches where outlier-ness is measured in learned feature space. The choice of method depends on the data substrate: image anomaly detection benefits from CNN autoencoders or contrastive image embeddings; light-curve anomaly detection benefits from RNN or transformer-based encoders; spectroscopic anomaly detection often uses dimensionality reduction followed by density estimation in the reduced space.

Operational deployments

Production anomaly-detection systems exist across several operational contexts. ZTF anomaly screens the ZTF alert stream for unusual transients; SNAD (Pruzhinskaya et al. 2019, ongoing) has produced multiple peer-reviewed discoveries of new variable-star types and unusual transients. SDSS-V anomaly screens spectra for unusual stellar types. The FBOT (fast blue optical transient) class was first surfaced as an anomaly population in survey data before being recognised as a distinct class.

Foundation models for astronomy

The 2023–2026 wave of foundation models for astronomy is the major methodological development of recent years. AstroCLIP (Parker et al. 2024) trains a contrastive image-spectrum model that produces aligned embeddings. AstroLLaMA applies LLaMA-style language modelling to astronomical literature. The Multimodal Universe project (Pan et al. 2024–2025) trains a joint model across photometric images, spectra, light curves, and tabular catalogues. Astromer (Donoso-Oliva et al. 2023) is a transformer foundation model for light curves. The empirical case for foundation models — that a single pretrained model fine-tuned on small downstream-task labels outperforms task-specific architectures trained from scratch — is increasingly compelling and is reshaping the methodological default.

Self-supervised pretraining strategies

The pretraining objectives that work for astronomical foundation models include: masked autoencoding on light curves and spectra (mask out time stamps or wavelength bins, predict masked values), contrastive pretraining (positive pairs from the same object across modalities, negative pairs from different objects), jigsaw and rotation-prediction (rotation invariance for galaxy images), and simulation-augmented pretraining (using simulated data to provide additional training pairs). The combination of self-supervised pretraining with a small amount of labelled fine-tuning data is the dominant pattern.

Discovery via anomaly detection

The anomaly-detection literature has produced several confirmed-discovery cases: new tidal-disruption-event subtypes (the various N-class TDEs surfaced by ZTF), unusual eclipsing-binary geometries (heartbeat stars, complex-EB systems), the first FBOTs, and various peculiar AGN. The discovery science is clearest when the anomaly population becomes large enough to characterise systematically; isolated anomalies often turn out to be data artefacts. Operational deployments therefore typically combine ML anomaly scoring with cross-checks (multiple-method agreement, follow-up observations, cross-survey confirmation).

Challenges in foundation-model deployment

Several challenges remain in deploying foundation models in astronomy. Compute requirements: training a Multimodal-Universe-class model requires substantial GPU time, and updating it as new survey data accumulates is non-trivial. Distribution shift: a foundation model trained pre-LSST may need substantial retraining once LSST data accumulates, since the data distribution will shift. Interpretability: foundation-model embeddings are opaque, and explaining why a particular object is anomalous (or why a particular morphology classification was assigned) is non-trivial. The methodological work in foundation-model deployment is now substantial enough to be its own subfield.

The Frontier and the Operational Question

AI for astronomy is methodologically mature and operationally embedded. The frontier as of 2026 has several dimensions: scaling foundation models to LSST-era data; closing the simulation-reality gap for SBI; operationalising real-time pipelines for mature surveys; and the ongoing methodological work around interpretability, calibration, and robustness. This section traces the open methodological questions and the directions the field is moving in.

The LSST-era operational frontier

LSST began full operations in 2025, and the alert-rate, image-volume, and downstream-inference demands are the central operational stress test of the next decade. Real-time alert classification has reached production maturity with the existing brokers, but the long-term challenge is the integration of LSST data with Gaia, Euclid, Roman, and the JWST archive into joint inference workflows. The 2025–2030 period will see the methodological focus shift from per-survey ML to cross-survey ML: foundation models that operate uniformly across the major sky surveys, joint inference frameworks that combine LSST and Euclid for cosmology, and unified catalogue-quality flags.

Gaia DR4 and the astrometric frontier

Gaia DR4, expected in 2026, will provide the first substantial astrometric exoplanet catalogue (thousands of new detections, particularly wider-orbit and longer-period planets), substantially-improved stellar parameters from BP/RP spectra, and a much-expanded variable-star catalogue. The methodological work for DR4 ingestion is well underway; the operational question is whether the existing exoplanet ML pipelines (Kepler/TESS heritage) will transfer cleanly to the astrometric-detection setting, where the data structure is qualitatively different.

SBI maturity and the simulation-reality gap

Cosmological SBI is now the methodological default for many parameter-inference problems, but the simulation-reality gap — the systematic difference between simulated training data and real observations — remains the central robustness concern. The 2024–2026 work on simulation-based calibration, cross-suite validation (training on IllustrisTNG, validating on EAGLE), and model-misspecification-aware SBI methods is moving toward operational deployment. Whether SBI will eventually become the standard for all cosmological inference (replacing likelihood-based MCMC entirely) or coexist with classical methods is genuinely open.

The biosignature question and exoplanet atmospheric retrieval

The K2-18b dimethyl-sulfide claim of 2025, and the broader question of biosignature interpretation, has highlighted the need for robust atmospheric-retrieval methods that handle the systematic uncertainties (stellar contamination, modelling assumptions, false-positive scenarios) properly. JWST's continued atmospheric work, plus the ARIEL mission (ESA, launching 2029), will produce hundreds of exoplanet atmospheric measurements over the next decade. The methodological question — how to combine atmospheric retrieval, biosignature scoring, and false-positive accounting into a robust framework — is unresolved and is one of the highest-stakes open problems in AI for astronomy.

The LISA frontier

LISA's 2035 launch is far enough in the future that the methodological work is preparatory rather than operational, but the data-analysis challenge is qualitatively different from ground-based GW: simultaneous-source detection, much longer signal durations, and a substantially-different noise budget. The 2025–2035 methodological work — transformers for long-time-series, joint-source SBI, precomputed waveform-template-bank ML alternatives — will be a major preparatory effort.

What this chapter has not covered

Several areas are out of scope. The substantial pulsar-timing-array methodology has not been developed in detail beyond §11. Solar physics and heliophysics ML are out of scope. Cosmic-ray and neutrino astronomy ML are touched only briefly. Specific cosmological-model-comparison work (e.g., the early-dark-energy / modified-gravity / evolving-w analyses underway in 2024–2026) is not developed. The chapter aimed at the methodological core of AI for astronomy and astrophysics; the broader landscape of astronomical AI is genuinely vast, and review-paper resources should be used for any specific subfield.