The Compendium · 2026 Edition

A full map of modern AI & data science — with each chapter its own essay.

Eighteen parts, from the mathematics underneath to the governance questions overhead. Each chapter is intended to stand alone as a self-contained essay — long enough to teach, short enough to read in an evening. What's ready is marked; everything else is on the way.

What this is

A working table of contents for a longer project. Each part below groups a handful of closely related chapters, and each chapter — when written — will explain its topic from first principles, with worked examples, diagrams where they help, and pointers to the canonical references for going deeper.

Topics are ordered roughly by foundation: the mathematics and programming that underpin everything, then the classical and modern machine-learning methods built on top, then the application areas, infrastructure, and the hard questions of safety and governance. Skip around freely — the parts are designed to be readable out of order.

Part I

Mathematical Foundations

01

Linear AlgebraAvailable
vectors, matrices, decompositions, eigenvalues
02

Calculus & Differential EquationsAvailable
multivariable calculus, ODEs, PDEs as relevant to physics-informed ML
03

Optimization TheoryAvailable
convexity, gradient descent, Lagrangians, constrained optimization
04

Probability TheoryAvailable
random variables, distributions, expectation, concentration inequalities
05

Statistics & Statistical InferenceAvailable
frequentist inference, hypothesis testing, regression, experimental design
06

Information TheoryAvailable
entropy, KL divergence, mutual information, compression
07

Bayesian ReasoningAvailable
Bayes' theorem, priors, posteriors, conjugacy, hierarchical models
08

Signal ProcessingAvailable
Fourier transforms, convolution, filtering, sampling — prerequisite for audio and time series

Part II

Programming & Software Engineering

01

Python for Data ScienceAvailable
Python idioms, data manipulation with pandas, NumPy
02

Scientific ComputingAvailable
SciPy, numerical methods, linear algebra libraries
03

Algorithms & Data StructuresAvailable
complexity, trees, graphs, hashing — what ML practitioners actually need
04

Software Engineering PrinciplesAvailable
clean code, testing, design patterns, documentation
05

Databases & SQLAvailable
relational databases, query optimization, NoSQL overview
06

Version Control & Collaborative DevelopmentAvailable
Git, code review, branching strategies

Part III

Data Engineering & Systems

01

Data Collection & AcquisitionAvailable
web scraping, APIs, data procurement, synthetic data
02

Data Storage & WarehousingAvailable
data lakes, warehouses, columnar formats, Parquet, Delta Lake
03

Data Pipelines & OrchestrationAvailable
Airflow, Prefect, dbt, batch vs. stream pipelines
04

Streaming & Real-Time DataAvailable
Kafka, Flink, event-driven architectures
05

Distributed ComputingAvailable
Spark, MapReduce, distributed data processing
06

Cloud Platforms & InfrastructureAvailable
AWS, GCP, Azure — services relevant to data and ML
07

Data Quality, Governance, & MetadataAvailable
data contracts, lineage, cataloging, observability

Part IV

Classical Machine Learning

01

Supervised Learning: RegressionAvailable
linear and polynomial regression, regularization, generalized linear models
02

Supervised Learning: ClassificationAvailable
logistic regression, decision trees, Naive Bayes, kNN
03

Ensemble MethodsAvailable
bagging, boosting, random forests, gradient boosting, XGBoost
04

Unsupervised Learning: ClusteringAvailable
k-means, DBSCAN, hierarchical clustering, Gaussian mixture models
05

Dimensionality ReductionAvailable
PCA, ICA, t-SNE, UMAP, autoencoders
06

Probabilistic Graphical ModelsAvailable
Bayesian networks, Markov random fields, HMMs
07

Kernel Methods & Support Vector MachinesAvailable
the kernel trick, SVMs, Gaussian processes seen from above
08

Feature Engineering & SelectionAvailable
encoding, interaction terms, mutual information, wrapper and filter methods
09

Model Evaluation & SelectionAvailable
cross-validation, metrics, calibration, overfitting, leakage

Part V

Deep Learning Foundations

01

Neural Network FundamentalsAvailable
perceptrons, backpropagation, activation functions, MLPs
02

Training Deep NetworksAvailable
optimizers — SGD, Adam, scheduling — initialization, batch size
03

Regularization & GeneralizationAvailable
dropout, weight decay, data augmentation, early stopping
04

Convolutional Neural NetworksAvailable
convolutions, pooling, receptive fields, classic architectures
05

Sequence ModelsAvailable
RNNs, LSTMs, GRUs, vanishing gradients, sequence-to-sequence
06

Attention MechanismsAvailable
soft and hard attention, self-attention, cross-attention, multi-head attention
07

Transfer Learning & PretrainingAvailable
fine-tuning, domain adaptation, representation learning

Part VI

Natural Language Processing & Large Language Models

01

NLP FundamentalsAvailable
tokenization, morphology, POS tagging, parsing, linguistic structure
02

Classical NLPAvailable
bag of words, TF-IDF, n-grams, named entity recognition, information extraction
03

Word Embeddings & Distributional SemanticsAvailable
Word2Vec, GloVe, fastText, contextualized representations
04

The Transformer ArchitectureAvailable
encoder, decoder, positional encoding, layer norm, architecture variants
05

Pretraining ParadigmsAvailable
masked LM, causal LM, encoder-only, decoder-only, encoder-decoder
06

Large Language Models: Scale & Emergent CapabilitiesAvailable
scaling laws, emergent behaviors, capabilities and limitations
07

Instruction Tuning & AlignmentAvailable
RLHF, DPO, Constitutional AI, preference learning
08

Fine-Tuning & Parameter-Efficient AdaptationAvailable
full fine-tuning, LoRA, prefix tuning, adapters, model merging
09

Retrieval-Augmented GenerationAvailable
dense retrieval, hybrid search, RAG architectures, long-context tradeoffs
10

LLM EvaluationAvailable
benchmarks, contamination, human evaluation, critique of leaderboards

Part VII

Computer Vision

01

Image Representation & Classical VisionAvailable
pixel statistics, color spaces, edge detection, classical feature descriptors
02

Modern Image Classification & ArchitecturesAvailable
ResNets, EfficientNets, Vision Transformers, scaling
03

Object Detection & Instance SegmentationAvailable
YOLO, Faster R-CNN, DETR, SAM
04

Video UnderstandingAvailable
temporal modeling, optical flow, action recognition, video transformers
05

3D Vision & Spatial UnderstandingAvailable
depth estimation, point clouds, NeRF, 3D reconstruction
06

Vision-Language ModelsAvailable
CLIP, image captioning, visual question answering, grounding

Part VIII

Speech, Audio & Music

01

Audio Signal ProcessingIn progress
waveforms, spectrograms, MFCCs, mel filterbanks
02

Automatic Speech RecognitionIn progress
CTC, attention-based, Whisper, streaming ASR
03

Text-to-Speech & Voice SynthesisIn progress
WaveNet, Tacotron, neural vocoding, voice cloning
04

Speaker Recognition & DiarizationIn progress
speaker embeddings, verification, who-spoke-when
05

Audio Classification & Sound UnderstandingIn progress
environmental sound, music tagging, sound event detection
06

Music Generation & Music AIIn progress
symbolic music, audio generation, MusicLM-style models

Part IX

Reinforcement Learning

01

RL FundamentalsIn progress
MDPs, Bellman equations, value functions, policies, exploration
02

Tabular RLIn progress
Q-learning, SARSA, dynamic programming, model-based planning
03

Deep Q-Networks & Value-Based MethodsIn progress
DQN, double DQN, dueling networks, Rainbow
04

Policy Gradient & Actor-Critic MethodsIn progress
REINFORCE, A3C, PPO, SAC, TD3
05

Model-Based RL & World ModelsIn progress
Dyna, Dreamer, MBPO, planning with learned models
06

Multi-Agent Reinforcement LearningIn progress
cooperative, competitive, emergent behavior
07

Offline RL & Imitation LearningIn progress
behavior cloning, inverse RL, conservative Q-learning
08

Preference Learning & RLHFIn progress
reward modeling, human feedback, RLAIF

Part X

Generative Models

01

Variational AutoencodersIn progress
ELBO, reparameterization, disentanglement, latent spaces
02

Generative Adversarial NetworksIn progress
training dynamics, mode collapse, StyleGAN, progressive training
03

Normalizing FlowsIn progress
change of variables, RealNVP, Glow, discrete flows
04

Diffusion ModelsIn progress
DDPM, score matching, classifier-free guidance, latent diffusion
05

Autoregressive Generative ModelsIn progress
PixelCNN, WaveNet, GPT as generative model
06

Image & Video GenerationIn progress
Stable Diffusion, DALL-E, Sora-style video, consistency models
07

3D & Multimodal GenerationIn progress
3D-aware generation, NeRF-based synthesis, any-to-any models
08

Multimodal Foundation ModelsIn progress
GPT-4V, Gemini, Flamingo — architectures that jointly process modalities

Part XI

AI Agents & Autonomous Systems

01

Agent FundamentalsIn progress
sense-plan-act loops, agent taxonomies, environments, PDDL
02

LLM-Based AgentsIn progress
ReAct, chain-of-thought, tool-augmented agents, cognitive architectures
03

Tool Use & Function CallingIn progress
APIs, code execution, browser use, structured outputs
04

Memory & Knowledge ManagementIn progress
episodic, semantic, working memory — RAG vs. in-context vs. parametric
05

Planning & ReasoningIn progress
tree-of-thought, MCTS, decomposition, verification
06

Multi-Agent SystemsIn progress
coordination, communication, role specialization, debate, emergent behavior
07

Agent Evaluation & BenchmarkingIn progress
task success, efficiency, safety, trajectory evaluation

Part XII

Robotics & Embodied AI

01

Robot Perception & SensingIn progress
cameras, lidar, IMU fusion, SLAM, sensor calibration
02

Motion Planning & ControlIn progress
path planning, trajectory optimization, PID, model predictive control
03

Learning from Demonstration & ImitationIn progress
behavior cloning, DAgger, teleoperation datasets
04

Sim-to-Real TransferIn progress
domain randomization, physics simulators, gap mitigation
05

Foundation Models for RoboticsIn progress
RT-2, generalist manipulation policies, vision-language-action models
06

Autonomous VehiclesIn progress
perception stack, prediction, planning, safety, regulatory context

Part XIII

Specialized ML Methods

01

Time Series Analysis & ForecastingIn progress
ARIMA, exponential smoothing, temporal CNNs, Transformers for time series
02

Anomaly DetectionIn progress
statistical methods, isolation forest, autoencoders, contextual vs. collective anomalies
03

Causal InferenceIn progress
potential outcomes, DAGs, do-calculus, IV methods, difference-in-differences
04

Causal Machine LearningIn progress
causal discovery, uplift modeling, double ML, heterogeneous treatment effects
05

Graph Neural NetworksIn progress
message passing, GCN, GAT, GraphSAGE, heterogeneous graphs
06

Survival Analysis & Event ModelingIn progress
Kaplan-Meier, Cox regression, neural survival models
07

Bayesian Deep LearningIn progress
Bayesian neural nets, Monte Carlo dropout, deep GPs, Laplace approximation
08

Meta-Learning & Few-Shot LearningIn progress
MAML, prototypical networks, in-context learning as meta-learning
09

Continual & Lifelong LearningIn progress
catastrophic forgetting, EWC, progressive networks, replay methods
10

Federated Learning & Privacy-Preserving MLIn progress
federated averaging, differential privacy, secure aggregation
11

Neurosymbolic AIIn progress
logic plus learning, knowledge graphs, program synthesis, neuro-symbolic reasoning

Part XIV

Applied Domains

01

Recommender SystemsIn progress
collaborative filtering, content-based, matrix factorization, sequential recommendation
02

Search & Information RetrievalIn progress
BM25, dense retrieval, learning to rank, neural search
03

Financial ML & Quantitative MethodsIn progress
alpha research, risk modeling, high-frequency, fraud detection
04

Healthcare & Clinical AIIn progress
medical imaging, EHR modeling, clinical NLP, trial design, regulatory considerations
05

AI for CybersecurityIn progress
intrusion detection, malware classification, adversarial robustness in security contexts
06

AI for Education & PersonalizationIn progress
knowledge tracing, adaptive learning, intelligent tutoring
07

AI for Manufacturing & OperationsIn progress
predictive maintenance, quality control, supply chain optimization
08

Human-AI Interaction & UXIn progress
interface design, cognitive load, trust calibration, feedback collection

Part XV

AI for Science

01

Scientific Machine LearningIn progress
data-driven discovery, surrogate models, physics-informed neural networks
02

AI for Biology & GenomicsIn progress
sequence modeling, variant effect prediction, single-cell analysis
03

AI for Drug Discovery & Molecular DesignIn progress
molecular representations, generative chemistry, docking, ADMET prediction
04

AI for Protein ScienceIn progress
AlphaFold, structure prediction, protein design, function prediction
05

AI for Climate & Earth SystemsIn progress
weather forecasting, climate emulators, remote sensing
06

AI for Physics, Materials & AstronomyIn progress
neural operators, materials property prediction, simulation surrogates

Part XVI

MLOps & Production ML

01

Experiment Tracking & ReproducibilityIn progress
MLflow, W&B, DVC, determinism, environment management
02

Feature Stores & Data Management for MLIn progress
online/offline stores, point-in-time correctness, Feast, Tecton
03

Model Deployment & ServingIn progress
REST, gRPC, batch vs. real-time, model registries, containerization
04

Model Monitoring & Drift DetectionIn progress
data drift, concept drift, shadow deployment, alerting
05

CI/CD for Machine LearningIn progress
automated retraining, testing for ML, MLOps pipelines
06

A/B Testing & Causal Experimentation in ProductionIn progress
randomization, CUPED, multi-armed bandits
07

Responsible Release & Deployment PracticesIn progress
staged rollouts, kill switches, incident response, documentation

Part XVII

AI Infrastructure & Systems

01

Hardware for MLIn progress
GPUs, TPUs, NPUs, memory bandwidth, roofline model
02

Distributed TrainingIn progress
data parallelism, model parallelism, pipeline parallelism, ZeRO, FSDP
03

Model CompressionIn progress
pruning, quantization, knowledge distillation, structured vs. unstructured
04

Inference OptimizationIn progress
batching, KV caching, speculative decoding, FlashAttention, serving frameworks
05

AI Chips & Custom SiliconIn progress
ASIC design philosophy, photonics, neuromorphic computing, the competitive landscape

Part XVIII

AI Safety, Alignment & Governance

01

AI Safety FundamentalsIn progress
problem framing, threat models, instrumental convergence, Goodhart's law
02

Technical Alignment MethodsIn progress
scalable oversight, debate, amplification, interpretability-based approaches
03

Robustness & Adversarial MLIn progress
adversarial examples, certified defenses, distribution shift, red-teaming
04

Mechanistic InterpretabilityIn progress
circuits, features, superposition, probing, causal tracing
05

Explainability for PractitionersIn progress
SHAP, LIME, saliency maps, counterfactuals, when each method applies
06

Fairness, Bias & EquityIn progress
sources of bias, fairness definitions and tensions, auditing, mitigation
07

Privacy in MLIn progress
differential privacy, membership inference, model inversion, data deletion
08

AI Governance, Policy & RegulationIn progress
EU AI Act, executive orders, standards bodies, liability, international coordination

The compendium is a work in progress — chapters will land as they're written, and the table above will update with each release. If you have corrections, suggestions, or just want to tell me which chapter should be written next, you know where to find me.

— Alex