Part XV · AI for Science · Chapter 05

Pharmacology, Drug Discovery & AI, from the discovery pipeline to ML-driven molecular design.

Drug discovery is where AI methods meet the highest economic stakes in the life sciences. This chapter develops both the working pharmacology vocabulary an AI reader needs to engage with the field (Sections 2–9 — the discovery pipeline, drug targets, pharmacokinetics, pharmacodynamics, ADMET, clinical trials, regulation, modalities) and the AI methodology that has substantially reshaped early-stage discovery since 2018: molecular representations (Section 11), property and ADMET prediction (Section 12), virtual screening and docking (Section 13), binding-affinity prediction (Section 14), generative chemistry (Section 15), retrosynthesis and reaction prediction (Section 16), phenotypic AI (Section 17), industry deployment (Section 18), and the frontier (Section 19). The single chapter combines what the field treats as inseparable: the pharmacology that frames the problems and the AI methods that increasingly drive how candidates are proposed, evaluated, and triaged.

Prerequisites & orientation

This chapter is both a domain primer and an AI-methods chapter. The first half (Sections 1–9) assumes basic biology and chemistry but no pharmacology background; it builds on the chemistry vocabulary developed in Ch 02 and the protein-science material of Ch 03. The second half (Sections 10–19) assumes the working machinery of modern deep learning (Part VI), the protein-AI methodology of Ch 03 (AlphaFold structures, ESM embeddings, equivariant architectures — all of which feed into drug-discovery methods here), the graph-neural-network material of Part XIII Ch 05 (essential for Section 11), the diffusion-model material of Part X (essential for Section 15), and the foundation-model material of Part X (substrate for many methods throughout). Readers with a pharma background can skim Sections 2–9; readers with strong ML but no pharmacology should take their time with the first half before engaging with the second.

Three threads run through the chapter. The first is the discovery pipeline: drug development is a multi-stage funnel from target identification through preclinical, Phase I/II/III trials, and approval, with attrition at every stage. The second is the multi-objective nature of drug design: a successful drug candidate must simultaneously bind its target with high affinity, achieve adequate selectivity, satisfy ADMET constraints, be synthesisable, avoid intellectual-property conflicts, and survive scale-up to manufacturing — ML methods that optimise any single objective in isolation produce candidates that fail under the others. The third is the experimental loop: drug discovery is not a pure-prediction discipline; designs are synthesised and tested, results feed back into models, and the methodology is fundamentally interactive. Section 16 develops the active-learning machinery; the broader pattern recurs throughout the AI half of the chapter.

In this chapter

Why Pharmacology, and Why Pharma-AI cellular workforce · disease · drug targets · the discovery funnel
The Drug-Discovery Pipeline target ID · preclinical · trials · attrition · costs
Drug Targets and Mechanism of Action GPCRs · kinases · binding · agonists · antagonists
Pharmacokinetics: ADME absorption · distribution · metabolism · excretion · CYP450
Pharmacodynamics dose-response · potency · efficacy · receptor occupancy
Toxicology and the ADMET Framework hepatotoxicity · cardiotoxicity · safety · ADMET filters
Clinical Trials Phase I/II/III · randomisation · endpoints · biomarkers
FDA, EMA, and Regulatory Frameworks IND · NDA · BLA · approval · post-market
Drug Modalities and the Industry Landscape small molecule · biologic · gene therapy · pharma economics
From Drug Discovery to ML: An Orientation transferring ML to drug discovery · what changes · the methodological bridge
Molecular Representations SMILES · SELFIES · graphs · fingerprints · 3D conformers
Property Prediction and ADMET QSAR · MoleculeNet · Tox21 · CYP inhibition · hepatotoxicity
Virtual Screening and Molecular Docking AutoDock · DiffDock · pose prediction · binding pockets
Binding Affinity Prediction free energy · scoring functions · neural docking · FEP
Generative Chemistry and De Novo Design VAEs · diffusion · autoregressive · scaffold hopping
Retrosynthesis and Reaction Prediction Molecular Transformer · AiZynthFinder · forward · retro
Phenotypic AI: Cell Painting and Beyond image-based screening · cell painting · phenotypic models
Industry Deployment and Empirical Validation AI-native biotechs · clinical trials · the empirical record
The Frontier and the Pharma Integration Question multi-modal · foundation models · pharma integration · 2028

Why Pharmacology, and Why Pharma-AI

Drug discovery is the largest sustained applied-science effort in modern history: the global pharma industry spends roughly $250 billion annually on R&D, runs tens of thousands of clinical trials, and brings ~50 new drugs to market each year. The discipline operates at the intersection of biology, chemistry, medicine, statistics, regulation, and economics, and the methodology has its own substantial vocabulary that an AI reader needs in order to engage with the field. This chapter develops both the working pharmacology vocabulary (Sections 2–9) and the AI methodology that has substantially reshaped early-stage drug discovery since 2018 (Sections 10–19). Section 10 frames what makes drug-discovery AI methodologically distinctive from an ML perspective; this section maps the pharmacology itself.

The discovery funnel

Drug development is a multi-stage funnel, not a single procedure. A program begins with target identification: choosing a specific molecular target — usually a protein, sometimes a nucleic acid sequence — whose modulation should produce a therapeutic effect. Hit identification finds compounds that engage the target. Lead optimisation improves potency, selectivity, ADMET, and synthesisability over many design-make-test cycles. Preclinical development establishes safety in animal models. Clinical trials run in three phases on increasing populations of patients to establish safety and efficacy in humans. Regulatory approval by FDA, EMA, or comparable authorities depends on the cumulative evidence. The full cycle averages 10–15 years and ~$2.6 billion in fully-loaded R&D cost per approved drug, with attrition at every stage. Section 2 develops this in detail.

Why drug discovery matters for AI

Drug discovery is where AI methods meet the highest economic stakes in life sciences. Each approved drug represents billions of dollars of investment and decades of effort; each that reaches patients can transform standard of care for the disease it addresses. The early-stage chemistry — molecular design, property prediction, virtual screening, retrosynthesis — is where AI methods have produced their strongest empirical wins, with multiple AI-discovered candidates now in clinical trials. The later stages (clinical trial design, real-world evidence, post-market surveillance) are increasingly engaging with AI methods too, though regulatory caution and the high costs of failure shape the methodology differently from the early stages. The combination of clear scientific importance, abundant labelled training data (PubChem, ChEMBL, ZINC, the various proprietary HTS datasets), and tractable problem formulations has made drug discovery one of the most-active AI application areas of the past decade.

How this chapter is organised

Sections 2–9 develop the working pharmacology vocabulary an AI reader needs: the drug-discovery pipeline (Section 2), drug targets and mechanisms (Section 3), pharmacokinetics (Section 4), pharmacodynamics (Section 5), toxicology and the ADMET framework (Section 6), clinical trials (Section 7), regulatory frameworks (Section 8), and drug modalities and the industry landscape (Section 9). Section 10 turns to the AI methodology, framing what makes drug-discovery AI distinctive from a machine-learning perspective. Sections 11–19 develop the methods: molecular representations (11), property and ADMET prediction (12), virtual screening and docking (13), binding-affinity prediction (14), generative chemistry (15), retrosynthesis (16), phenotypic AI (17), industry deployment (18), and the frontier (19).

A Discipline at the Intersection

Drug discovery sits at the intersection of biology (which targets matter), chemistry (which molecules engage them), pharmacology (how molecules behave in living systems), clinical medicine (which interventions help patients), and economics (which programs survive the funding gauntlet). The pharmacology vocabulary developed in Sections 2–9 is the prerequisite for engaging with the AI methods of Sections 10–19 — the AI methodology has been substantially shaped by the constraints, attrition realities, and regulatory engagement of the field it serves.

The Drug-Discovery Pipeline

The canonical drug-discovery pipeline has been the same in broad outline for forty years: identify a target, find molecules that engage it, optimise those molecules into a drug candidate, test for safety in animals, then test for safety and efficacy in humans through three phases of clinical trials, then seek regulatory approval. The details have evolved substantially — particularly in the early stages — but the overall structure is stable enough that every drug-development project organises around it.

Target identification and validation

The pipeline begins with target identification: choosing a specific molecular target — usually a protein, sometimes a nucleic acid sequence or a metabolic pathway — whose modulation should produce a therapeutic effect. The choice draws on basic biology research, genetic evidence (genes whose disruption produces or prevents disease), pathway analysis, and increasingly omics-level data integration. Target validation follows: confirm that engaging this target actually has the desired effect in disease-relevant systems, typically through genetic knockdown or pharmacological tool compounds. Targets that pass validation become the starting point for a discovery program; those that fail get retired (often after substantial investment). The target-identification step is where Ch 04's AI-for-biology methods (genomic foundation models, perturbation prediction, multi-omics integration) most directly reach drug discovery.

Hit identification

Once a target is selected, the goal becomes finding small molecules that engage it. Hit identification screens libraries of compounds for binding or activity. High-throughput screening (HTS) tests millions of compounds against the target in automated assays, typically at a single concentration, identifying the few hundred to few thousand "hits" that show measurable activity. Fragment-based drug discovery (FBDD) screens smaller fragment libraries (typically 1,000–10,000 compounds, each <300 Da) to find low-affinity but ligand-efficient binders that can be grown into drugs. Virtual screening uses computational methods to score compound libraries against a target structure (Section 13 develops modern AI-based virtual screening). DNA-encoded libraries (DELs) physically link each molecule to a unique DNA barcode, enabling simultaneous screening of millions to billions of compounds. The methodology has expanded substantially since 2015; modern hit identification routinely combines multiple approaches.

Hit-to-lead and lead optimisation

Most hits are not drugs — they have low affinity, poor selectivity, or unfavourable physicochemical properties. Hit-to-lead optimisation transforms hits into "lead" compounds with improved binding affinity, selectivity, and drug-like properties. Lead optimisation further refines leads into preclinical-development candidates, balancing potency against ADMET properties (Sections 4–6), synthetic accessibility, and intellectual-property considerations. The combined hit-to-lead-to-candidate phase typically takes 2–4 years and consumes 1,000–10,000 synthesised compounds. The methodology is heavily iterative — synthesise, test, analyse, propose new structures — and is where modern AI methods (generative chemistry, multi-objective optimisation, ADMET prediction) have produced the strongest empirical wins.

Preclinical development

Before a candidate compound can be tested in humans, it must pass preclinical safety evaluation. The preclinical phase includes detailed pharmacokinetic and toxicology studies in at least two animal species (typically a rodent and a non-rodent), formulation development (how the drug will be made into a tablet, injection, or other dosage form), manufacturing-process scale-up, and substantial documentation. The output is an Investigational New Drug (IND) application to the FDA (or the equivalent CTA — Clinical Trial Application — to other regulators), with the regulator's approval required before human trials can begin. Preclinical typically takes 1–3 years and costs $5–20 million per candidate.

Clinical trials and approval

The clinical-trial system (Section 7 develops it in detail) consists of three phases of pre-approval trials (Phase I for safety, Phase II for efficacy and dose, Phase III for confirmatory efficacy and broader safety) plus post-approval Phase IV studies. After successful Phase III, the company files a New Drug Application (NDA, for small molecules) or Biologics License Application (BLA, for biologics) with the FDA. Review and approval take 6–12 months for standard reviews and as little as 6 months for priority/breakthrough designations. The full pre-approval timeline from target identification to approval averages 10–15 years.

Post-market and lifecycle management

Approval is not the end. Post-marketing surveillance monitors for rare adverse events that didn't appear in pre-approval trials. Phase IV studies may be conducted to expand approved indications, optimise dosing, or compare to standard of care. Lifecycle management includes formulation improvements (e.g., switching from immediate-release to extended-release), pediatric studies (often regulatory requirements), label expansions, and patent strategy as the original patent expires (typically ~20 years from filing, leaving 7–14 years of market exclusivity post-approval). Generic competition begins immediately at patent expiration and typically reduces brand revenue by 80% within 1–2 years.

The drug-discovery pipeline. Each stage filters candidates: ~10 candidate targets at Stage 1 narrow to ~1 approved drug, with attrition concentrated at hit-to-lead and Phase II. AI methods produce most value at the early stages where cost-of-error is lowest and the underlying problems (molecular property prediction, ADMET, virtual screening) map cleanly onto modern ML.

Drug Targets and Mechanism of Action

A drug works by physically binding to a specific molecular target and altering its activity. Understanding what makes a good target, what classes of targets exist, and how drugs engage them is the foundation for everything that follows — and the substrate for AI methods that predict drug-target interactions.

Druggability

Not every protein in the body is a viable drug target. Druggability is the property of having a binding site that small molecules can occupy with sufficient affinity and selectivity. The classical druggable target has a defined binding pocket — a concave region of the protein surface, ~300–800 cubic Ångströms, with a mix of hydrophobic and hydrophilic surfaces appropriate for small-molecule binding. Most successful drugs target proteins with active-site or allosteric pockets that meet these criteria: enzymes (with substrate-binding pockets), G-protein-coupled receptors (with ligand-binding pockets), ion channels, and nuclear hormone receptors. Targets without obvious binding pockets (transcription factors, scaffold proteins) are historically called "undruggable," though modern methods (PROTACs, molecular glues, covalent inhibitors) are gradually expanding the druggable space.

The major target classes

Approved drugs concentrate around a small number of target classes. G-protein-coupled receptors (GPCRs) are the most-drugged class — over 30% of approved drugs target GPCRs, including many classical drug categories (beta-blockers, antihistamines, opioids, antipsychotics). They are membrane proteins with characteristic seven-transmembrane-helix architecture and large diversity (~800 GPCRs in humans). Enzymes are the second-largest class — kinase inhibitors (the various "-nib" drugs for cancer), HMG-CoA reductase inhibitors (statins), proteasome inhibitors, ACE inhibitors, and many others. Ion channels are major targets in cardiovascular disease, epilepsy, and pain (calcium-channel blockers, sodium-channel blockers, the various lidocaine-class anaesthetics). Nuclear hormone receptors bind small lipophilic ligands (sex hormones, thyroid hormones, vitamin D) and are targets for cancer (anti-estrogens, anti-androgens), metabolic disease, and inflammation. Transporters (SLC and ABC families) are emerging targets, with SGLT2 inhibitors for diabetes the most-prominent recent success.

Beyond proteins: other target types

Most drug targets are proteins, but not all. DNA is the target of many cancer chemotherapeutics (alkylating agents, topoisomerase inhibitors), though typically with broad selectivity and substantial toxicity. RNA is an emerging target class — antisense oligonucleotides (Spinraza for spinal muscular atrophy), small interfering RNAs (siRNA, exemplified by Onpattro for hereditary amyloidosis), and RNA-binding small molecules (Risdiplam for SMA) all engage RNA. Pathogen-specific targets (viral proteins, bacterial cell-wall enzymes, fungal cell-wall components) are the substrate of antibiotics and antivirals. Microbiome-targeted therapeutics are an active research area but have produced few approved drugs as of 2026.

Agonists, antagonists, and modulators

How a drug engages its target determines its effect. Agonists bind the target and activate it (mimicking the natural ligand) — albuterol activates the β2-adrenergic receptor to relax airway muscle. Antagonists bind without activating, blocking the natural ligand from binding and producing the opposite effect — beta-blockers (propranolol, metoprolol) antagonise β-adrenergic receptors to slow the heart. Inverse agonists bind a constitutively-active target and reduce its baseline activity. Allosteric modulators bind sites distinct from the natural ligand-binding site and either potentiate (positive allosteric modulators) or attenuate (negative) the response. Covalent inhibitors form irreversible bonds with their targets — aspirin acetylates cyclooxygenase, the proton-pump inhibitors covalently modify gastric H+/K+-ATPase, the various BTK inhibitors covalently modify a specific cysteine in the kinase active site.

Selectivity and the off-target problem

A perfect drug binds only its intended target. Real drugs always bind multiple targets, with the relative affinities determining the balance between intended therapeutic effects and unintended off-target effects. The methodology of selectivity assessment is substantial: a candidate kinase inhibitor is typically profiled against a panel of 300–500 kinases to identify off-targets; the resulting selectivity profile shapes whether the candidate proceeds. Common off-target concerns include hERG (a cardiac potassium channel — binding it causes life-threatening arrhythmias and has killed multiple late-stage drug candidates), CYP450 enzymes (drug-metabolising enzymes whose inhibition causes drug-drug interactions), and various GPCRs whose accidental engagement produces side effects. AI methods for off-target prediction (Section 12 develops them) are increasingly central to early-stage candidate triage.

Mechanism of action

A drug's mechanism of action (MoA) is the full chain from target engagement to clinical effect: binds target X with affinity Y → alters target's function in way Z → produces downstream cellular effect W → treats disease V. The chain can be short and direct (statins inhibit HMG-CoA reductase → reduce cholesterol synthesis → lower LDL → reduce cardiovascular events) or long and incompletely understood (many psychiatric drugs work but their full MoA is unclear). Establishing MoA is a substantial part of drug development, and regulators increasingly require mechanistic understanding for novel approvals — particularly for biomarker-stratified therapies where the MoA explains why specific patient populations respond.

Pharmacokinetics: ADME

Pharmacokinetics (PK) is what the body does to the drug — how it is absorbed, distributed, metabolised, and excreted. The four-letter mnemonic ADME captures the framework, and ADME prediction is among the most-tractable AI applications in drug discovery because the underlying problems (molecular property prediction, metabolic-pathway prediction, transporter substrate prediction) map cleanly onto modern ML methods.

Absorption

Absorption describes how the drug enters the bloodstream from its site of administration. The dominant route is oral: the drug is swallowed, dissolves in gastric or intestinal fluid, and is absorbed across the intestinal epithelium into the portal venous system. Oral absorption depends on physicochemical properties (Ch 02 §7's Lipinski rules apply): adequate solubility (the drug must dissolve), appropriate lipophilicity (log P typically 0–5; too low and the drug doesn't cross membranes, too high and it doesn't dissolve), molecular weight (<500 Da generally), and avoidance of efflux transporters (P-glycoprotein in particular). Bioavailability measures the fraction of an oral dose that reaches systemic circulation; values vary widely (atorvastatin ~14%, propranolol ~25%, doxycycline ~95%), with low bioavailability often reflecting first-pass hepatic metabolism. Other administration routes (intravenous, intramuscular, subcutaneous, topical, inhaled, transdermal) bypass intestinal absorption and have their own pharmacokinetic characteristics.

Distribution

Distribution describes how the drug spreads through the body. Key determinants include plasma protein binding (drugs that bind albumin or α1-acid glycoprotein with high affinity have less free drug available to act on tissues), tissue penetration (lipophilic drugs accumulate in adipose; hydrophilic drugs stay in extracellular water), and crossing of the blood-brain barrier (which excludes most drugs from the central nervous system unless specifically designed to cross). Volume of distribution (Vd) summarises the apparent volume the drug occupies — high Vd indicates extensive tissue distribution, low Vd indicates the drug stays largely in plasma. Vd values vary by orders of magnitude across drugs, from ~5 L (heparin, plasma-restricted) to thousands of L (chloroquine, accumulates in tissues).

Metabolism

Metabolism is the chemical transformation of the drug, primarily by the liver. The dominant enzymes are the cytochrome P450 (CYP) family — particularly CYP3A4 (handles ~50% of all drugs), CYP2D6, CYP2C9, CYP2C19, and CYP1A2. CYP enzymes typically perform phase I metabolism (oxidation, reduction, hydrolysis) that adds polar functional groups to drugs, making them more water-soluble and easier to excrete. Phase II metabolism (glucuronidation, sulphation, glutathione conjugation) attaches polar groups to further increase water solubility. The combined output is metabolites that are typically less active than the parent drug and are excreted in urine or bile.

CYP-mediated metabolism is a major source of drug-drug interactions. A drug that inhibits CYP3A4 (clarithromycin, ketoconazole) raises blood levels of any co-administered drug metabolised by CYP3A4 — sometimes to dangerous levels (the famous Seldane/terfenadine interaction with ketoconazole caused fatal arrhythmias and led to terfenadine's withdrawal). Drugs that induce CYP enzymes (rifampin, phenytoin, St John's wort) accelerate metabolism of co-administered drugs, reducing their efficacy. The CYP-interaction landscape is complex enough that predicting it is a substantial AI application area, with ML-based CYP-inhibition predictors part of routine ADMET evaluation.

Excretion

Excretion is the removal of drug and metabolites from the body. The dominant routes are renal (urine) and biliary/faecal. Renal clearance depends on glomerular filtration (which removes free drug from plasma at ~125 mL/min in healthy adults), tubular secretion (active transport into the tubule lumen), and tubular reabsorption (passive return of lipophilic drugs to plasma). Biliary excretion handles many drugs and metabolites; the resulting molecules can either be excreted in faeces or undergo enterohepatic recycling (deconjugation by gut bacteria, reabsorption, return to liver) which extends drug half-lives. Patients with impaired renal function (chronic kidney disease, age-related decline) require dose adjustments for renally-cleared drugs.

Half-life and dosing regimens

Combining absorption, distribution, and elimination produces a drug's half-life (t½) — the time for plasma concentration to decline by half. Half-life ranges from minutes (insulin, ~5 minutes) to days (atorvastatin, ~14 hours; warfarin, ~40 hours; cetuximab, ~5 days; dupilumab, ~21 days). The dosing interval is typically chosen to maintain steady-state plasma levels within a target therapeutic window — drugs with short half-lives need frequent dosing or extended-release formulations; drugs with long half-lives can be dosed weekly, monthly, or even less frequently. The trade-offs between half-life, dosing convenience, and ability to stop the drug if adverse events occur are major design considerations.

Why ADME prediction matters

ADME failures are the second-largest source of attrition in drug development (after Phase II efficacy failures). A candidate compound with excellent target engagement but poor absorption, fast metabolism, or rapid excretion will not become a viable drug. Modern drug-discovery programs evaluate ADME properties early — the era of "we'll fix the PK later" is over, because the cost of redesigning a chemical series for better ADME is substantial. AI-based ADME prediction has been one of the most-mature drug-discovery applications, with commercial QSAR-based and ML-based predictors integrated into routine candidate triage at most pharma companies. Section 12 develops the methodology in detail.

Pharmacodynamics

Pharmacodynamics (PD) is what the drug does to the body — the relationship between drug concentration at the target and biological effect. Where pharmacokinetics is mostly about chemistry (how the drug moves through the body), pharmacodynamics is mostly about biology (how the body responds to the drug).

Receptor occupancy and dose-response

The fundamental PD relationship is between drug concentration and target engagement. As drug concentration rises, the fraction of target molecules bound by drug rises, following classical receptor-binding kinetics. The EC50 (half-maximal effective concentration) is the drug concentration at which 50% of the maximum effect is achieved; the IC50 is the analogous value for inhibition. The full dose-response curve typically follows a sigmoidal Hill equation: effect = Emax · [drug]ⁿ / (EC50ⁿ + [drug]ⁿ), where n is the Hill coefficient (typically ~1 for simple binding, >1 for cooperative binding). The slope of the dose-response curve, the maximum response (Emax), and the EC50 together characterise a drug's potency and efficacy.

Efficacy vs. potency

Two different concepts often confused. Potency is how much drug is needed to produce an effect — a drug with EC50 = 1 nM is more potent than one with EC50 = 100 nM. Efficacy is the maximum effect the drug can produce — a partial agonist has maximum effect below 100% even at saturating concentrations. A highly potent drug with low efficacy may be clinically inferior to a less potent drug with full efficacy. Methadone (a partial μ-opioid agonist) has carefully tuned efficacy that allows opioid-replacement therapy without the full euphoria of heroin; aspirin's antiplatelet efficacy at low doses is what makes it useful for cardiovascular protection.

Therapeutic window

Every drug has a range of plasma concentrations that produce therapeutic benefit without unacceptable toxicity — the therapeutic window. Drugs with wide therapeutic windows (penicillins, modern antihistamines) tolerate substantial dosing variability without harm; drugs with narrow windows (warfarin, lithium, digoxin, theophylline) require careful monitoring and individualised dosing. The ratio of toxic to effective dose is the therapeutic index; therapeutic indices <5–10 typically require therapeutic drug monitoring with periodic blood-level measurements. AI methods for predicting individual-patient therapeutic windows from sparse data are an active drug-development area.

Tolerance, desensitisation, and resistance

Drug responses are not static. Tolerance is the diminished response to a drug after repeated administration — the body adapts. Desensitisation refers to the receptor-level adaptation (down-regulation, internalisation, uncoupling from signalling) that produces tolerance. Tachyphylaxis is rapid tolerance over hours to days. Drug resistance in infectious disease and cancer arises through evolutionary pressure on the pathogen or tumour. Each phenomenon has therapeutic implications: opioid tolerance limits long-term efficacy and contributes to dependence; antibiotic resistance has reshaped infectious-disease medicine; cancer drug resistance is the dominant cause of late-stage chemotherapy failure.

Combining PK and PD: PK/PD modelling

PK and PD intersect at the body's response. PK/PD modelling combines the two — predicting plasma concentration over time (PK) and predicting biological effect from concentration (PD) — to optimise dosing regimens. The methodology spans simple compartmental models (where the body is approximated as 1, 2, or 3 well-mixed compartments) through complex physiologically-based pharmacokinetic (PBPK) models that explicitly model individual tissues. PBPK simulations are used to predict drug behaviour in special populations (paediatric, elderly, renally-impaired) without requiring clinical studies in each population. AI-augmented PK/PD modelling is an active frontier, with the methodology connecting to the time-series and Bayesian-deep-learning material of Part XIII.

Biomarkers and surrogate endpoints

Many drugs work too slowly or rarely for direct outcome measurement to be tractable. Biomarkers are measurable substitutes for clinical outcomes — LDL cholesterol as a biomarker for cardiovascular disease risk, HbA1c for diabetes management, viral load for HIV treatment, tumour shrinkage for cancer therapy. Surrogate endpoints are biomarkers accepted by regulators as substitutes for clinical outcomes in trials — substantial drug approvals rely on surrogate endpoints rather than mortality reduction (which would require massive trials). The validation of surrogates is a substantial epidemiological and regulatory enterprise; AI methods for biomarker discovery and validation are increasingly integrated into the methodology.

Toxicology and the ADMET Framework

Drugs cause harm as well as benefit. Toxicology is the science of drug-induced harm, and adding "T" to ADME produces the dominant industry framework: ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity). ADMET evaluation is the gauntlet most candidate molecules fail to clear, and ADMET prediction is the most-mature AI-for-drug-discovery application area.

The major toxicity categories

Toxic effects of drugs cluster into several recognised categories. Hepatotoxicity (liver toxicity) is the leading cause of acute liver failure and a major reason for drug withdrawal — troglitazone, bromfenac, and most-recently several kinase inhibitors have been withdrawn or label-restricted for liver toxicity. The mechanism varies (direct toxicity, idiosyncratic reactions, mitochondrial damage), and prediction is harder than for most other tox categories. Cardiotoxicity includes hERG-mediated QT prolongation (a leading cause of late-stage drug failure), heart-failure exacerbation, and cardiomyopathy from cancer drugs (anthracyclines, trastuzumab). Genotoxicity measures DNA-damaging potential, evaluated through Ames tests (bacterial mutagenicity), micronucleus tests (chromosomal damage), and increasingly in-silico predictors. Carcinogenicity studies (typically two-year rodent studies) evaluate cancer risk from chronic exposure. Reproductive toxicity studies evaluate effects on fertility, fetal development, and lactation.

Predictive toxicology

ADMET prediction methods range from simple to sophisticated. Structural alerts are functional groups associated with specific toxicities — aromatic nitro groups for genotoxicity, anilines for hepatotoxicity, nitrofurans for various toxicities. QSAR (quantitative structure-activity relationship) models predict toxicity from molecular descriptors — physicochemical properties, fingerprints, learned representations. Read-across methods predict toxicity by analogy to similar compounds with known toxicity profiles. Mechanistic models simulate specific tox endpoints (e.g., the hERG channel binding) using physical simulation or QSAR. The 2020s wave of ML-based ADMET predictors (Section 12 develops them) has substantially improved performance, particularly for toxicities where labelled training data is abundant (hERG, CYP inhibition, hepatotoxicity).

Idiosyncratic vs. dose-dependent toxicity

Two qualitatively different toxicity mechanisms. Dose-dependent toxicity is predictable from drug concentration — high enough doses of any drug cause harm; the issue is keeping doses below the harmful level. Idiosyncratic toxicity is unpredictable, dose-independent, and rare — a few patients in millions experience severe reactions through mechanisms that aren't fully understood (often immune-mediated). Idiosyncratic reactions are why some severe drug toxicities only emerge after market launch and millions of patient-exposures: pre-approval trials with thousands of patients can't detect 1-in-100,000 reactions. The methodology of post-market surveillance (FDA's MedWatch, EMA's EudraVigilance, the various national systems) is what catches these.

Drug-drug interactions

Many adverse events arise from drug-drug interactions rather than individual-drug toxicity. The major mechanisms: pharmacokinetic interactions (CYP enzyme inhibition or induction, transporter inhibition — see Section 4), pharmacodynamic interactions (additive or synergistic effects on the same receptor or pathway), and physical/chemical interactions in formulation. Modern drug development includes systematic in vitro evaluation of CYP and transporter interactions, with the resulting label warnings driving prescribing practice. AI methods for drug-drug interaction prediction are a substantial application area, particularly for complex polypharmacy in elderly patients.

The 3R principles and animal studies

Preclinical toxicology requires animal studies — at least one rodent and one non-rodent species, typically rats and dogs. The 3R principles (Reduction, Refinement, Replacement, codified by Russell & Burch 1959) guide ethical animal use: reduce numbers where possible, refine procedures to minimise suffering, replace with non-animal alternatives where validated. Modern in-vitro and in-silico methods are gradually replacing some animal studies — the 2022 FDA Modernisation Act formally allowed alternatives to animal testing for IND applications. AI methods are central to the replacement effort, with computational toxicology increasingly supplementing animal studies for specific endpoints.

The ICH guidelines

The International Council for Harmonisation (ICH) coordinates regulatory expectations across the major pharmaceutical markets (US, EU, Japan, plus increasingly other countries). ICH guidelines specify what preclinical and clinical studies must be conducted before approval — ICH M3 (preclinical safety), ICH S7 (cardiovascular and respiratory safety pharmacology), ICH S2 (genotoxicity), ICH E6 (good clinical practice), and many others. Compliance with ICH guidelines is essentially mandatory for companies seeking approval in any major market, and the guidelines themselves shape what ADMET evaluations must be done. AI methods entering this space have to engage with ICH expectations as a first-class concern.

Clinical Trials

Clinical trials are the formal evaluation of a drug in humans. The four-phase system (Phases I through IV) is the dominant framework, with each phase answering specific questions and gatekeeping access to subsequent phases. Understanding the phase structure, the trial designs that operate within it, and the regulatory and ethical layers around it is essential context for drug-discovery AI.

Phase I: First-in-human safety

Phase I is the first administration of a new drug to humans, typically in 20–100 healthy volunteers (or patients, for drugs too toxic to give to volunteers — e.g., cancer chemotherapy). The primary objectives are safety (identifying acute toxicity, characterising dose-limiting toxicities) and pharmacokinetics (measuring how the drug behaves in humans). Trials typically use ascending doses — a few subjects at a low dose, monitor for safety, escalate to higher doses — to identify the maximum tolerated dose or pharmacologically-active dose. Phase I lasts 6–12 months and costs $5–20 million. The historical attrition rate from Phase I to Phase II is ~70% — meaning ~30% of drugs entering Phase I successfully advance.

Phase II: Efficacy and dose

Phase II tests whether the drug actually works in patients with the target disease. Trials typically enrol 100–300 patients and run for months to years. The objectives are efficacy signal (does the drug work?), dose finding (what dose is optimal?), and further safety characterisation (what adverse events emerge in disease populations that weren't visible in healthy volunteers?). Phase II is the critical "proof of concept" stage for a drug — it's where most candidates fail (the historical attrition rate is ~50–70%, the highest of any stage). The Phase II failure mode is typically lack of efficacy: the drug looked promising in animals but doesn't actually treat the disease in humans. This failure mode is the central problem of drug discovery, and reducing Phase II attrition is the single most-valuable thing AI methods could deliver.

Phase III: Confirmatory trials

Phase III is the large-scale confirmatory evaluation. Trials typically enrol 1,000–10,000+ patients across multiple sites and countries, with double-blind randomisation against placebo or standard-of-care comparators. The objectives are definitive efficacy, broader safety characterisation (rare adverse events that need large samples to detect), and generation of the evidence base for regulatory approval and clinical practice. Phase III runs 2–5 years and costs $50–500 million per trial. The attrition rate in Phase III is ~30–40% — substantial, often driven by safety signals or insufficient efficacy at the population level. Successful Phase III studies form the core of the New Drug Application (NDA) submission.

Phase IV: Post-approval

Phase IV studies happen after approval and serve various purposes: post-marketing surveillance for rare adverse events, label-expansion studies to add new indications, comparative-effectiveness studies against alternatives, and FDA-required post-approval commitments for accelerated-approval drugs. Phase IV studies range from passive surveillance through randomised controlled trials, with the methodology increasingly drawing on real-world-evidence (RWE) data from electronic health records, claims databases, and patient registries.

Trial design considerations

Several design choices shape what a clinical trial can demonstrate. Randomisation assigns patients to treatment arms randomly to control for confounding. Blinding (single-blind or double-blind) prevents bias from patient or physician knowledge of treatment assignment. Placebo control is the gold standard for trials where no effective treatment exists; active control (vs. existing drug) is used when withholding treatment would be unethical. Endpoints are the outcomes measured (overall survival is the gold standard for cancer trials; surrogates like progression-free survival are increasingly accepted). Adaptive designs allow modifications during the trial based on interim analyses (sample size re-estimation, dose modification, early stopping for efficacy or futility); they are increasingly common but require careful pre-specification to maintain statistical validity.

Special populations and ethics

Several patient populations require special considerations. Paediatric studies are required by FDA Pediatric Research Equity Act for most drugs being approved for adults. Geriatric studies and renal/hepatic-impairment studies address dose modifications for vulnerable populations. Pregnant and lactating patients are typically excluded from initial trials but require careful post-approval study because most drugs end up being prescribed to some pregnant patients. The Declaration of Helsinki (originally 1964, revised many times) is the foundational ethical framework for clinical research; the ICH-GCP (Good Clinical Practice) standards operationalise it for industry trials. Institutional Review Boards (IRBs, called ethics committees outside the US) review and approve every clinical trial protocol.

The clinical-trial data landscape

Clinical-trial data is increasingly accessible to AI methods. ClinicalTrials.gov registers ~500,000+ trials globally, with results required for many. The EU Clinical Trials Register serves the European equivalent. Vivli and similar platforms provide controlled access to anonymised individual-patient data from many trials. The Yale Open Data Access (YODA) project facilitates academic access to industry data. AI methods for trial design optimisation, patient stratification, and surrogate-endpoint development are an active research area, with increasingly substantial deployment at major pharma companies.

FDA, EMA, and Regulatory Frameworks

Drug regulation is the political and legal framework under which all pharmaceutical activity operates. Understanding the major regulators, the approval frameworks, and the special pathways available is essential context — particularly for AI methods, where regulatory acceptance of algorithmically-derived evidence is an active and unsettled area.

The FDA

The U.S. Food and Drug Administration (FDA) regulates pharmaceuticals, biologics, devices, and food in the United States. The relevant divisions for drugs are CDER (Center for Drug Evaluation and Research) for small molecules and most biologics, and CBER (Center for Biologics Evaluation and Research) for vaccines, blood products, and gene-and-cell therapies. The FDA's drug-approval workflow runs through the IND (Investigational New Drug) application that authorises clinical trials, followed by the NDA (New Drug Application) or BLA (Biologics License Application) that authorises commercial sale. Standard review takes 10–12 months; priority review reduces it to 6–8 months for drugs offering significant improvements over existing therapies.

The EMA and other regulators

The European Medicines Agency (EMA) is the EU's centralised regulator. Drugs receive approval through the centralised procedure (mandatory for biologics, advanced therapies, and certain other categories) which produces an EU-wide approval, or through national procedures for some categories. The PMDA (Pharmaceuticals and Medical Devices Agency) is Japan's regulator. Health Canada, TGA (Australia), NMPA (China, formerly CFDA), and many others serve their respective markets. ICH guidelines (Section 6) substantially harmonise the major regulators' expectations, but each retains its own approval workflow, and approvals in one market do not automatically translate to others.

Accelerated approval pathways

Several FDA programs accelerate approval for high-medical-need drugs. Breakthrough Therapy designation provides intensive FDA guidance and rolling review for drugs showing substantial improvement over existing therapies in early trials. Fast Track designation facilitates communication with FDA throughout development. Accelerated Approval (subpart H/E for drugs/biologics) allows approval based on surrogate endpoints reasonably likely to predict clinical benefit, with required confirmatory trials post-approval. Priority Review Vouchers can be earned for tropical-disease and pediatric-rare-disease approvals and traded to expedite other applications. Each pathway has specific eligibility criteria and trade-offs; the methodology of selecting and pursuing the right pathway is a substantial drug-development discipline.

Orphan-drug designation

Drugs for rare diseases (defined in the US as affecting <200,000 patients) qualify for Orphan Drug Designation, providing 7 years of US market exclusivity, tax credits for clinical development, and waived FDA fees. The EU has similar provisions (10 years of EU exclusivity). The orphan designation has substantially reshaped pharmaceutical strategy since the 1983 Orphan Drug Act — many of the most-profitable modern drugs are orphan designations that broadened to larger markets post-approval (e.g., Humira, originally orphan-designated for rare juvenile arthritis).

Biosimilars and generics

After patents expire, lower-cost competitors can enter the market. Generics are bioequivalent copies of small-molecule drugs, requiring only demonstration of similar pharmacokinetics rather than full clinical trials. Biosimilars are the analogous category for biologics — they are highly similar but not identical, due to the manufacturing complexity of biologic production, and require more substantial evaluation than generics. The Hatch-Waxman Act (1984) governs generic approval in the US; the Biologics Price Competition and Innovation Act (2010) governs biosimilars. Generic and biosimilar competition typically reduces brand revenue by 80–90% within 1–2 years of patent expiration, which substantially shapes pharmaceutical investment decisions.

Pricing, reimbursement, and access

Approval is necessary but not sufficient — drugs must also be priced and reimbursed to reach patients. The US has a fragmented payer system (Medicare, Medicaid, private insurers) with varying coverage decisions; price negotiations historically operated through pharmacy benefit managers (PBMs). The 2022 Inflation Reduction Act introduced Medicare price negotiation for select high-spend drugs, with the first negotiated prices effective 2026 and substantial implications for pharmaceutical economics. Outside the US, most countries have national health-technology-assessment bodies (NICE in the UK, IQWiG in Germany, HAS in France) that evaluate cost-effectiveness and inform reimbursement decisions. The methodology of health-technology assessment (HTA) is its own discipline, with cost-per-QALY (Quality-Adjusted Life Year) the dominant evaluation metric.

AI in regulatory submissions

A specific contemporary issue: how AI methods enter regulatory submissions. As of 2026 the FDA has issued discussion papers on AI/ML in drug development (the 2023 discussion paper on AI/ML in drug development, the 2025 final guidance on AI for medical devices) but has not yet established comprehensive frameworks for AI-derived evidence in NDA submissions. The current state is that AI methods used internally for hit identification or candidate triage are uncontroversial; AI used to generate primary clinical evidence (e.g., AI-derived biomarkers, AI-stratified efficacy claims) requires substantial documentation and increasingly engages the FDA's Software as a Medical Device frameworks. The methodology of preparing AI-derived evidence for regulatory submission is rapidly evolving and is a substantial concern for any AI-for-drug-discovery deployment that aims at commercial impact.

Drug Modalities and the Industry Landscape

Most of this chapter has implicitly assumed traditional small-molecule drug discovery. The actual industry includes a much broader range of modalities, each with its own pharmacology, manufacturing, and regulatory considerations. Understanding the modality landscape is essential for understanding where AI methods are being deployed and why.

Small molecules

Small molecules remain the dominant modality by volume. Typical molecular weights of 200–500 Da, oral bioavailability, well-characterised manufacturing through traditional organic chemistry, and substantial cumulative knowledge in CYP-mediated metabolism and ADMET prediction. Most "classic" drugs (statins, antibiotics, beta-blockers, antihistamines, antidepressants) are small molecules. The methodology of small-molecule drug discovery has matured substantially over decades, and AI methods (Sections 11–19 develop them) have produced their strongest empirical wins here — the substrate of public chemistry data (PubChem, ChEMBL, ZINC) and the well-defined molecular-property prediction problems map cleanly onto modern ML.

Biologics: monoclonal antibodies and beyond

Biologics are protein-based therapeutics — substantially larger molecules (~100–200 kDa for typical antibodies), produced in mammalian or bacterial cell culture, and administered by injection rather than orally (their size prevents intestinal absorption). The dominant biologic class is monoclonal antibodies (the various "-mab" drugs), accounting for the largest share of pharmaceutical revenue by 2024. Examples include Humira/adalimumab (for autoimmune disease), Keytruda/pembrolizumab (cancer immunotherapy), Trastuzumab (HER2-positive breast cancer), and the various COVID-19 antibodies. Antibody-drug conjugates (ADCs) attach cytotoxic small-molecule payloads to antibodies for targeted cancer therapy. Bispecific antibodies engage two different targets simultaneously. Fusion proteins, therapeutic enzymes, and peptide drugs round out the protein-therapeutic landscape. AI methods for antibody design and engineering (developed in Ch 03) are central to the biologics pipeline.

Cell and gene therapies

The 2017 approval of Kymriah (CAR-T cell therapy for leukaemia) marked the arrival of cell therapies as a commercial reality. Cell therapies modify or transplant living cells — autologous CAR-T cells (the patient's own T cells engineered to attack their cancer) being the most-developed category. Gene therapies deliver functional genes to patients with genetic diseases — Luxturna (for inherited retinal dystrophy, 2017), Zolgensma (for spinal muscular atrophy, 2019), and Casgevy (the first CRISPR therapy, 2023, for sickle cell disease) are landmark approvals. The methodology is technically and economically distinctive: production cost per dose can reach $2–4 million, with manufacturing pipelines that resemble specialised biotech rather than traditional pharma. AI methods for cell-and-gene-therapy design (vector engineering, payload optimisation, manufacturing-process analytics) are active research areas.

RNA therapeutics

The 2018 approval of Onpattro (patisiran, an siRNA for hereditary amyloidosis) opened RNA therapeutics as a substantial drug class. Antisense oligonucleotides (Spinraza for SMA, the various Ionis assets) bind specific mRNAs and block translation or alter splicing. siRNAs trigger RNA-induced silencing complex degradation of target mRNAs. mRNA vaccines exploded into mainstream medicine via the 2020 COVID-19 pandemic — Pfizer/BioNTech and Moderna mRNA vaccines were the first widely-deployed mRNA therapeutics, with substantial implications for the broader mRNA-therapeutic pipeline. The methodology is distinctive (RNA chemistry, lipid-nanoparticle delivery, immunogenicity considerations); AI methods for RNA structure prediction and antisense-oligo design are active areas.

The industry landscape

The pharmaceutical industry as of 2026 includes major established companies (Pfizer, Roche, Novartis, Merck, Johnson & Johnson, AstraZeneca, GSK, AbbVie, Sanofi, Bristol-Myers Squibb, Eli Lilly, Gilead, plus various others), specialised biotechs (Genentech, now Roche; Regeneron; Vertex; Biogen; the various others), and a long tail of small biotechs with single-target programs. The 2024–2026 wave of AI-native biotechs (Recursion, Insilico Medicine, BenevolentAI, Atomwise, Generate Biomedicines, Cradle, Iambic, Isomorphic Labs, Xaira, Lila, the various others) collectively raised tens of billions in funding and have produced the first wave of AI-discovered drugs entering clinical trials. Whether AI-discovered drugs will succeed at higher rates than traditional approaches is the open question that the next several years will answer.

The economics of drug discovery

The economic logic that shapes the industry: very high upfront R&D costs, very high failure rates, very long timelines, and patent-protected pricing on the small fraction of compounds that reach market. The 2024 estimate from the Tufts Center for the Study of Drug Development put the average cost of bringing a new drug to market (counting failures) at ~$2.6 billion in 2013 dollars, and the figure has likely risen substantially since. The economics has substantial consequences: drugs for common conditions in wealthy populations (statins, biologics for autoimmune disease) attract substantial investment; drugs for rare diseases require orphan-drug incentives to be economically viable; drugs for tropical diseases prevalent in poor countries struggle to attract investment regardless of medical need. AI methods that reduce R&D costs or improve success rates have the potential to substantially shift this economic landscape — which is why the AI-for-drug-discovery space has attracted such substantial investment.

From Drug Discovery to ML: An Orientation

The previous nine sections established the pharmacology. This one is the bridge to the methodology that follows. AI methods for drug discovery have several distinctive properties that shape the methodology — properties that an AI professional approaching the field for the first time should understand explicitly. Most ML applications optimise a single objective on static data; drug discovery optimises many objectives simultaneously, in tight loops with experiments, against a backdrop of expensive failure modes and substantial regulatory scrutiny. This section orients the ML practitioner; Sections 11–19 develop the methods within that frame.

The data substrate

The public chemistry-and-biology data substrate is enormous and well-organised. PubChem (NIH) holds ~120 million compounds with associated assay results. ChEMBL (EBI) holds ~2.4 million bioactive molecules with ~20 million measured activity values against ~15,000 protein targets. ZINC (UCSF) holds ~37 billion virtually-synthesisable molecules for screening. DrugBank catalogues approved and investigational drugs with detailed pharmacological annotations. PDB holds ~220K experimental protein structures with bound ligands; PDBbind curates the binding-affinity subset. MoleculeNet bundles standard benchmark tasks across QSAR, ADMET, quantum-chemistry, and physiology endpoints. The combined public data substrate is among the largest, cleanest, and best-organised in any AI application area.

Pharma companies additionally hold substantial proprietary data: high-throughput screening results across millions of compounds, internal QSAR models trained on decades of internal projects, structure-activity relationships for chemical series under active development, ADMET data from internal experimental pipelines. The proprietary data is often higher-quality than the public data (consistent assay protocols, in-house controls) and is the substrate of competitive advantage at major pharma. AI-native biotechs that lack this data substrate have to either generate it (expensive) or partner with pharma (the dominant 2024–2026 strategy).

Multi-objective optimisation under uncertainty

A successful drug-discovery candidate must satisfy many constraints simultaneously: high binding affinity for the target, adequate selectivity against off-targets, drug-like physicochemical properties (Lipinski's rule of five and various refinements), acceptable ADMET profile, synthetic accessibility, freedom-to-operate (no patent conflicts), manufacturability at scale. ML methods that optimise any single objective in isolation produce candidates that fail under the others — a high-affinity binder that's not synthesisable, a drug-like molecule that doesn't bind the target, a synthesisable molecule that's hepatotoxic. The methodology of effective drug-discovery AI is fundamentally multi-objective optimisation under uncertainty, with the trade-offs between objectives often the central design problem.

The experimental loop

Drug discovery is not a pure-prediction discipline. Designs are synthesised, tested, and the results feed back into models. The methodology is fundamentally interactive — what computer-science calls "active learning" or "Bayesian optimisation" but here with experimental synthesis-and-assay turnaround times of days to weeks. Production drug-discovery AI lives within tight feedback loops, and the methodology's empirical performance often depends as much on how the loop is closed (which compounds get tested next; how experimental results are integrated; what model-update strategy is used) as on the underlying ML architecture. Section 16 develops the active-learning machinery; the broader pattern recurs across every section.

Validation realities

A specific tension in drug-discovery AI is that predictions are easy but useful predictions require careful evaluation. A model that achieves 0.95 ROC-AUC on a pre-2020 ChEMBL dataset may produce useless candidates because the test set was drawn from the same distribution as training. The methodology requires careful attention to: scaffold-aware splits (training and test molecules should not share core scaffolds), temporal splits (test on data published after training cut-off), activity-cliff handling (molecules with similar structure but very different activity, which standard models systematically mispredict), and ultimately prospective experimental validation on novel compounds. Modern best practice has moved toward more-rigorous evaluation, but the gap between optimistic benchmark numbers and real prospective performance is genuine and well-documented.

The regulatory layer

Unlike most AI domains, drug-discovery AI eventually has to engage with regulators. As of 2026 FDA accepts AI-derived molecules, AI-driven candidate triage, and AI-based ADMET predictions in IND submissions without special scrutiny — the AI is essentially treated as a sophisticated computational tool that produces molecules and predictions, with the resulting drugs evaluated by the same standards as drugs from any other source. AI applied to clinical-trial design (patient stratification, biomarker development, AI-derived efficacy endpoints) faces more substantive regulatory scrutiny, with the methodology of acceptable AI-derived clinical evidence still being established. Section 8 developed the regulatory framework; this chapter assumes that context throughout.

What Pharma-AI Demands of ML Practice

Multi-objective optimisation under uncertainty, tight experimental loops, validation realities that punish naive in-silico optimism, and regulatory engagement at the back end. AI for drug discovery has produced the strongest commercial activity of any AI-for-Science subdomain — collectively, AI-native biotechs raised tens of billions in funding and have produced the first wave of AI-discovered drugs entering clinical trials by 2026. Whether they succeed at higher rates than traditional approaches is the empirical question of the next 3–5 years.

The AI-for-drug-discovery stack. The data substrate (bottom) feeds three methodological pillars (middle) — representations, predictive methods (property/ADMET/docking/binding), and generative methods (de novo design + retrosynthesis) — which in turn enable the application layer (top). The chapter develops each layer in turn.

Molecular Representations

Every drug-discovery AI method begins with a representation of molecules. The choice of representation substantially constrains what the model can learn and how it generalises. Several representations dominate the field, each with characteristic strengths.

SMILES and SELFIES

SMILES (Simplified Molecular Input Line Entry System, Weininger 1988) serialises a molecular graph as a string. Methane is "C", ethanol is "CCO", benzene is "c1ccccc1" (lowercase c indicates aromatic), aspirin is "CC(=O)Oc1ccccc1C(=O)O". The encoding is compact, human-readable, and amenable to language-model architectures (autoregressive generation, masked-language-modelling). Major drawbacks: SMILES is non-canonical (multiple valid strings represent the same molecule), small string changes can produce invalid molecules, and the linear ordering imposes a search-history bias on generative models.

SELFIES (Self-Referencing Embedded Strings, Krenn et al. 2020) addresses SMILES's invalidity problem with a grammar where every string corresponds to a chemically valid molecule. The methodology guarantees validity at the cost of slightly less interpretable strings, and it has substantially displaced SMILES for generative chemistry where invalid outputs would be problematic. Modern molecular language models (Section 6) typically use SELFIES; classifiers and property predictors typically use SMILES because validity guarantees are less important when the input is human-curated.

Molecular graphs

The most direct representation is the molecular graph: nodes are atoms (with element, charge, hybridisation, formal-aromaticity, valence attributes), edges are bonds (with bond-order, aromaticity, stereochemistry attributes). The representation matches the underlying chemistry directly. Graph neural networks (Part XIII Ch 05) process molecular graphs naturally, and the methodology has dominated property prediction since roughly 2017.

The dominant graph-based architectures: message-passing neural networks (MPNN, Gilmer et al. 2017) iteratively update node representations by aggregating messages from neighbours, with the final per-atom features pooled into a per-molecule prediction; graph convolutional networks (GCN, Kipf & Welling 2017 adapted for chemistry) use spectral-graph-theory-inspired convolutions; graph attention networks (GAT) use attention rather than fixed aggregation; graph isomorphism networks (GIN, Xu et al. 2019) use sum aggregation with theoretical guarantees about expressiveness. Modern variants (D-MPNN, AttentiveFP, the various 2024 successors) refine the methodology with substantial empirical gains on standard benchmarks.

Molecular fingerprints

Molecular fingerprints are fixed-length binary or count vectors capturing substructure information. ECFP (Extended Connectivity Fingerprint, also called Morgan fingerprint, Rogers & Hahn 2010) is the dominant variant: each atom is hashed together with its surroundings up to a specified radius, producing a multiset of substructure hashes that gets folded into a fixed-length vector (typically 1024 or 2048 bits). The methodology is fast, scalable, and produces representations that work surprisingly well for similarity search and basic QSAR. Other fingerprint variants — MACCS (a curated 166-bit fingerprint), pharmacophore fingerprints, atom-pair fingerprints — exist but are less widely used for ML.

Fingerprints predate modern deep learning and remain useful baselines. Combining fingerprints with deep neural networks (FFNN on top of fingerprint input) often produces strong baselines that more-sophisticated GNN methods must compete with. The 2024–2026 wave of evaluation papers consistently shows fingerprint-based baselines competitive with GNN methods on many standard benchmarks, particularly when training data is limited (~10K compounds) and when the task is similarity-based.

3D representations and conformers

Many drug-discovery problems depend on 3D geometry: docking and binding-affinity prediction, conformational analysis, pharmacophore matching. 3D representations include atomic coordinates (which require explicit equivariance handling — Ch 01 §8), distance matrices, internal coordinates (bond lengths, angles, dihedrals), and various invariant featurisations. The challenge is that small molecules have multiple low-energy conformers (rotational isomers separated by accessible rotation barriers); the "right" conformer for binding may differ from the lowest-energy gas-phase conformer, and this conformational flexibility complicates 3D ML methods.

Conformer generation tools (RDKit's ETKDG, OMEGA, the various ML-based methods like ConfGF and GeoMol) produce 3D coordinates from a 2D molecular graph. Modern methods often generate ensembles of conformers and either use the most-energy-favoured for prediction or aggregate predictions across the ensemble. SE(3)-equivariant GNNs (DimeNet, NequIP, MACE, the various 2024 successors) operate on 3D coordinates directly while respecting rotational and translational symmetries — the methodology that Ch 01 §8 developed for materials and Ch 04 §2 developed for proteins applies here too.

Molecular language models

The 2022–2026 wave of molecular language models applies the protein-language-model paradigm to small molecules. ChemBERTa (Chithrananda et al. 2020) was the foundational paper: BERT-style masked-language-modelling on SMILES strings, producing molecular embeddings useful for downstream tasks. MolFormer (Ross et al. 2022, IBM) scaled to 1.1 billion molecules from PubChem and ZINC. GROVER uses graph-based transformer pretraining. Uni-Mol (Zhou et al. 2023) handles 3D molecular structures with SE(3)-equivariant pretraining. The 2024–2026 generation includes substantial commercial offerings (Iambic's NeuralPLexer, the various pharma-internal molecular foundation models) and increasingly multi-modal training (jointly on molecules, proteins, and assay data).

What representation when

A practical note: the choice of representation depends on the task. Similarity search against a large library: fingerprints (fast, scalable). Property prediction with abundant data: GNN or molecular language model. Property prediction with limited data: fingerprints or pretrained-LM embeddings often beat GNNs trained from scratch. Binding affinity prediction: 3D representations with equivariance (the structure of the bound complex matters). Generative chemistry: SELFIES (validity guarantees) or molecular graphs (natural for graph-based generators). The 2024–2026 trend is toward foundation models that produce representations useful across many tasks, with task-specific heads fine-tuned for each application.

Property Prediction and ADMET

The most-deployed AI applications in drug discovery are property predictors: given a molecule, predict its physicochemical, ADMET, or biological properties. The methodology has substantial pre-AI history (QSAR — quantitative structure-activity relationship — dating to the 1960s) that modern ML extends rather than replaces.

The QSAR tradition

Classical QSAR models predict a continuous activity (binding affinity, solubility, permeability) from molecular descriptors using linear regression, decision trees, support vector machines, or random forests. The methodology has a fifty-year empirical track record and produces predictors that are interpretable, computationally cheap, and often-competitive with deep learning when training data is limited. Modern QSAR practice typically combines ECFP fingerprints with random forests or gradient-boosted trees as a baseline against which more-sophisticated methods must compete. The methodology is mature enough that it's standard infrastructure at most pharma companies; ML methods that don't beat well-tuned QSAR baselines have a hard time deploying.

Standard property-prediction tasks

The major endpoints fall into several categories. Physicochemical properties: aqueous solubility (log S), lipophilicity (log P, log D), pKa, permeability. These are well-defined experimental quantities with substantial public training data. Pharmacokinetic properties: absorption (Caco-2 permeability, human intestinal absorption), distribution (plasma protein binding, blood-brain barrier penetration, volume of distribution), metabolism (CYP450 inhibition for CYP3A4/2D6/2C9/2C19/1A2; metabolic stability), excretion (renal clearance, biliary excretion). Toxicity endpoints: hepatotoxicity (DILI — drug-induced liver injury), cardiotoxicity (hERG inhibition is the dominant single endpoint), genotoxicity (Ames test prediction), carcinogenicity, skin sensitisation. Each has its own benchmark data and dominant ML methods.

The benchmark landscape

Several standard benchmarks anchor empirical evaluation. MoleculeNet (Wu et al. 2018) bundles 17 prediction tasks across regression and classification, physicochemical and biological endpoints, with standard scaffold splits. Tox21 (NIH NCATS) provides toxicity-prediction benchmarks across 12 nuclear-receptor and stress-response targets. ADMET-AI (2023) is a more-recent benchmark suite specifically targeting ADMET endpoints. Therapeutics Data Commons (TDC, 2021) is the most-comprehensive modern benchmark — 22 datasets covering ADMET, toxicity, drug-target interaction, and disease-area-specific tasks, with standardised splits and evaluation. Modern best practice evaluates against multiple benchmarks before deployment.

The CYP450 problem

A specific ADMET endpoint worth detail: CYP450 inhibition prediction. As Ch 07 §4 developed, CYP450 enzymes metabolise most drugs, and inhibition of CYP450 by one drug can dangerously raise blood levels of another. The five major CYP isoforms (CYP3A4, CYP2D6, CYP2C9, CYP2C19, CYP1A2) each have substantial labelled training data from in-vitro assays (typically tens of thousands of compounds per isoform). ML models predict CYP inhibition with ~85–90% accuracy for major isoforms — well-enough that production drug-discovery pipelines routinely use CYP-inhibition predictions as triage filters. The methodology has matured to the point that any candidate molecule is run through CYP-inhibition predictors before serious investment, which has substantially shifted the chemistry community's de-risking expectations.

Hepatotoxicity and the rare-event problem

The hardest ADMET endpoint is hepatotoxicity, primarily because labelled data is scarce, biased, and noisy. Drug-induced liver injury (DILI) is rare (~1 in 10,000–100,000 patient-exposures for many drugs), idiosyncratic, and often only recognised post-market. Public DILI datasets (the DILIrank dataset, the FDA's drug-induced-liver-injury database) contain hundreds to a few thousand drugs with binary labels of varying confidence. The methodology has to handle severe class imbalance, noisy labels, and substantial mechanistic heterogeneity (DILI happens through multiple mechanisms — direct mitochondrial toxicity, immune-mediated reactions, bile-salt-export-pump inhibition, the various others). Modern ML methods perform modestly on DILI prediction (typical AUC ~0.7–0.8); the gap between modern methods and human pharmacologist judgement is smaller here than for most other endpoints.

The activity-cliff problem

A specific evaluation concern: activity cliffs. Two molecules with high structural similarity (Tanimoto on ECFP > 0.7) but very different activities (10-fold or greater difference in binding) are common in real chemical series and represent the most-important predictions for medicinal chemistry decision-making. Standard ML methods systematically mispredict activity cliffs because their similarity priors smooth over them. The methodology of activity-cliff-aware training, evaluation, and deployment is an active research area, with methods like contrastive-loss variants and attention-based local-feature highlighting attempting to handle them. Production drug-discovery pipelines routinely include activity-cliff analysis as a separate evaluation step.

Virtual Screening and Molecular Docking

Virtual screening evaluates large compound libraries against a target of interest in silico, identifying candidates worth experimentally testing. Molecular docking is the central technique: predict how a small molecule would physically dock into a protein binding site, score the predicted complex, and rank candidates by score. The methodology has substantial pre-AI history (AutoDock, GOLD, Glide) that modern AI methods extend.

The classical docking pipeline

Classical molecular docking has three steps. (1) Protein preparation: take an experimental or AlphaFold structure, identify the binding site (manually or via cavity-detection algorithms like fpocket), prepare the protein (add hydrogens, assign protonation states, generate grid representations of the active site). (2) Ligand placement: search over the conformational and orientational degrees of freedom of the ligand within the binding site, scoring each pose. The search is computationally substantial — modern docking engines use heuristics (genetic algorithms in GOLD, Lamarckian search in AutoDock, exhaustive search in Glide HTVS) to explore the high-dimensional pose space. (3) Scoring: evaluate predicted poses with a scoring function that estimates binding free energy. The output is typically a ranked list of (pose, score) pairs.

The major classical docking engines: AutoDock Vina (open source, widely used in academia), Glide (Schrödinger, dominant in industry), GOLD (CCDC), DOCK (UCSF). They differ in scoring functions, search algorithms, and computational cost (HTVS modes for fast screening of millions of compounds vs. SP/XP modes for slower but more-accurate scoring). The methodology is mature; the limitations are real (scoring-function inaccuracy, conformational flexibility of the protein, water-mediated interactions, induced-fit effects).

Deep-learning docking

The 2022–2024 wave of deep-learning docking methods substantially changed the landscape. EquiBind (Stärk et al. 2022) was the foundational paper: a single end-to-end neural network that takes protein and ligand and outputs the bound pose, replacing the classical search-and-score pipeline with direct prediction. The methodology was substantially faster than classical docking but with somewhat lower accuracy.

DiffDock (Corso et al. 2023) extended this with a diffusion-based generative model: instead of predicting a single pose, DiffDock generates a distribution over plausible poses by reverse-diffusing from random initialisations. The methodology produces multiple candidate poses ranked by confidence, which is closer to how classical docking outputs are used in practice. Empirical performance on standard benchmarks (PDBbind, PoseBusters) substantially exceeded EquiBind and approached classical Glide-XP at much lower computational cost.

AlphaFold 3 (Ch 04 §2) extended the structure-prediction methodology to handle protein-ligand complexes natively, with the diffusion-based decoder generating both protein and ligand coordinates simultaneously. The 2024 release was particularly impactful: AlphaFold 3 substantially outperformed dedicated docking methods on benchmark protein-ligand complex prediction tasks, reframing what's possible in the field. The 2024–2026 wave of "AlphaFold 3 era" docking methods (Boltz-1, RoseTTAFold-AllAtom, the various successors) continues this trajectory.

Pose-prediction evaluation

Standard benchmarks for docking evaluation include PDBbind (a curated subset of PDB with measured binding affinities), CASF (the Comparative Assessment of Scoring Functions benchmark), and the more-recent PoseBusters benchmark (Buttenschoen et al. 2023). PoseBusters is methodologically important because it explicitly checks for physically-implausible poses that traditional benchmarks miss — chirality flips, steric clashes, broken bonds. The 2023–2024 papers documented that several deep-learning docking methods produced high benchmark scores but systematically generated poses that fail PoseBusters validity checks. The lesson is that sequence-prediction-style benchmarks can mislead when the prediction is structural; methodology has tightened around this.

The cross-docking problem

A subtle but important evaluation issue: redocking vs. cross-docking. Redocking takes the protein from a known crystal structure with a known bound ligand, removes the ligand, and asks the docking method to recover the bound pose — relatively easy, since the protein conformation is already adapted to the ligand. Cross-docking docks the same ligand into a different conformation of the same protein (e.g., apo-form, or a structure with a different bound ligand) — substantially harder because the protein conformation may be incompatible with the ligand without induced-fit changes. Most published docking-method numbers come from redocking benchmarks; cross-docking performance is typically substantially worse, and the gap reflects how hard real prospective docking is.

Ultra-large library screening

Modern make-on-demand chemical libraries (Enamine REAL, ZINC22, the various commercial offerings) contain billions of synthesisable compounds. Classical docking against these libraries is computationally prohibitive at the pose-prediction-and-scoring level — even at 1 second per compound, billion-compound libraries require GPU-years. The methodology has split: 2D-based pre-filters (fingerprint-based similarity to known actives, ML-based active-vs-inactive classifiers) reduce the library to ~10⁶ candidates that get docked; the docking-engine output then ranks the survivors for synthesis and testing. The 2024–2026 wave of "AI-first" library screening uses end-to-end neural rankers that bypass the docking step entirely, with empirical results suggesting comparable or better hit rates at substantially lower compute cost.

Binding Affinity Prediction

Predicting where a ligand binds (Section 4) is one problem; predicting how strongly it binds is a different and harder one. Binding affinity prediction has been a frontier of computational chemistry for decades, and the methodology spans physics-based free-energy calculations through machine-learning scoring functions to modern end-to-end neural predictors.

What binding affinity is

The thermodynamic measure of binding strength is the binding free energy ΔG_bind, related to the binding constant K_d (the dissociation constant) by ΔG_bind = -RT ln(K_a) = RT ln(K_d). Lower K_d means stronger binding; typical drug binding is in the nM (10⁻⁹ M) range, with high-affinity drugs reaching pM (10⁻¹² M). The free energy decomposes into enthalpic (∆H, intermolecular interactions) and entropic (T∆S, conformational freedom and solvent effects) contributions, with the entropic terms — particularly water entropy from displacement of bound waters — often dominating.

Experimentally, binding affinity is measured by isothermal titration calorimetry (ITC, gold standard but slow), surface plasmon resonance (SPR, fast and label-free), fluorescence-based assays (FRET, polarisation), and various functional assays. The major public dataset is PDBbind with ~20,000 protein-ligand complexes annotated with experimental affinities. Pharma companies hold proprietary affinity data of comparable or larger scale.

Free-energy perturbation

The most-rigorous physics-based method is free-energy perturbation (FEP) and the closely related thermodynamic integration (TI). The methodology runs molecular-dynamics simulations of the protein-ligand complex along an alchemical pathway that gradually transforms one ligand into another, computing the free-energy change directly. The empirical accuracy is among the best in computational chemistry — typical errors of ~1 kcal/mol on validated systems — but the cost is substantial (hours to days of GPU compute per ligand pair). Schrödinger's FEP+ is the most-deployed commercial implementation; the methodology is widely used at major pharma companies for late-stage lead optimisation, where precise affinity predictions justify the compute cost. The 2024 release of OpenFE and the various open-source FEP implementations has made the methodology more accessible to academic and smaller-biotech groups.

Machine-learning scoring functions

For high-throughput evaluation, ML-based scoring functions are the dominant alternative. RF-Score (Ballester & Mitchell 2010) was a foundational paper showing that random forests trained on simple atomic-pair-counting features outperformed classical empirical scoring functions on PDBbind. Subsequent neural-network-based scoring functions — Pafnucy (Stepniewska-Dziubinska et al. 2018), KDEEP, DeepDTA, GNINA (Ragoza et al. 2017, integrating ML scoring into AutoDock), OnionNet — all extended the methodology with various architectural innovations.

The empirical case for ML scoring functions is mixed. They consistently outperform classical scoring on benchmark sets like PDBbind, but the benchmark performance often doesn't translate to prospective use. The methodology has known weaknesses: substantial overfitting to PDBbind's specific composition (the test set often overlaps the training set in chemical-similarity space), poor generalisation to novel targets or chemotypes, and sensitivity to the protein-prep and ligand-prep choices that classical pipelines also depend on.

End-to-end neural binding prediction

The 2023–2024 wave of end-to-end neural docking-and-scoring methods integrates pose prediction with affinity prediction in a single network. DeepDTA takes ligand SMILES and protein sequence directly and predicts affinity without explicit 3D structure. Boltz-1 (2024) is an AlphaFold-3-style multimodal model that generates structures and predicts affinities simultaneously. The methodology promises to close the gap between docking accuracy and affinity prediction, with substantial empirical evidence accumulating in 2024–2026.

The benchmark validity problem

Affinity-prediction benchmarks have substantial validity problems that affect how published numbers should be interpreted. PDBbind's "general set" and "refined set" overlap with most ML training data (the test sets used in published papers often share scaffolds, target families, or even specific complexes with the training data). The 2023 wave of held-out benchmarks (HiQ-Bind, the various "no-leakage" benchmarks) shows substantially worse performance for most methods than the standard benchmarks suggest. The methodology of rigorous evaluation has lagged the methodology of method development; production deployment requires careful prospective evaluation that publication-track benchmarks may not support.

Selectivity prediction

A specific affinity-prediction subproblem is selectivity: given a target and several off-targets, predict the affinity ratios. Selectivity matters substantially for drug development — a kinase inhibitor that engages 50 kinases produces unpredictable side effects, while one that engages a specific 1–3 has cleaner efficacy and safety. The methodology of selectivity prediction is harder than single-target affinity because it requires accurate predictions across a panel of related proteins simultaneously. ML-based kinase selectivity panels (KSP-LM, the various kinome-aware methods) are increasingly central to kinase-inhibitor design pipelines, with similar approaches developing for other privileged target classes.

Generative Chemistry and De Novo Design

Property prediction (Section 3) and docking (Section 4) evaluate existing molecules; generative chemistry proposes new ones. The methodology has matured substantially since 2017, with current methods routinely producing diverse, drug-like, synthetically-accessible molecules conditioned on specified target properties.

Why generative chemistry is hard

The chemical space accessible to small-molecule design is vast — estimates of "drug-like" molecules range from 10⁶⁰ to 10¹⁸⁰ depending on size and complexity constraints. Most of this space is uninteresting (unstable molecules, non-synthesisable structures, molecules with obviously-bad ADMET). The methodology of effective generative chemistry is fundamentally about constrained sampling: produce molecules from the small fraction of chemical space that is simultaneously novel, drug-like, synthesisable, and engages the desired target. Pure unconditional generation is uninteresting; useful generation always conditions on multiple objectives.

The generative-model landscape

Several generative architectures are in active use. Variational autoencoders (Gómez-Bombarelli et al. 2018, the foundational paper) encode molecules into a continuous latent space and decode back to molecules, enabling latent-space optimisation. Generative adversarial networks (MolGAN, Cao & Kipf 2018) train a generator-discriminator pair on molecular graphs; less commonly used than VAEs because of training instability and mode collapse. Autoregressive models (RNN-based and transformer-based) generate molecules token-by-token in SMILES or SELFIES; ChemBERTa and the various molecular language models can be adapted for generation. Reinforcement learning approaches optimise molecular generation against a scalar reward (affinity, drug-likeness, synthesisability); REINVENT (Olivecrona et al. 2017, refined extensively) is the canonical method, with substantial deployment at major pharma. Diffusion models are the modern dominant paradigm — adapted from image generation, applied to molecular graphs and 3D structures.

Diffusion-based molecular generation

The 2022–2026 wave of diffusion-based methods has substantially advanced generative chemistry. EDM (Equivariant Diffusion Models for Molecule Generation, Hoogeboom et al. 2022) was a foundational paper applying SE(3)-equivariant diffusion to 3D molecular structures. GeoDiff generates conformations. MolDiff and DiffMol generate molecular graphs with simultaneous 2D-and-3D diffusion. The methodology connects directly to the diffusion-model material of Part X and to the protein-design diffusion methods of Ch 04 (RFdiffusion); the architectural patterns are similar with chemistry-specific tokenisation and constraint handling.

Conditional generation: structure-based design

The most-impactful generative methods in 2026 are structure-based: given a target protein structure (often from AlphaFold) with a specified binding pocket, generate molecules that should bind there. Pocket2Mol and DrugGPT condition autoregressive or diffusion-based generators on the binding-pocket geometry. DiffSBDD and TargetDiff use conditional diffusion. The methodology integrates protein structure (Ch 04) with molecular design directly, producing candidates pre-validated for binding-site complementarity. Empirical performance on standard benchmarks (CrossDocked2020, the various pocket-conditioned datasets) is substantially better than scaffold-based or unconditional methods.

The synthesisability problem

A persistent failure mode of generative chemistry is producing molecules that look reasonable but cannot actually be synthesised. The chemistry-of-organic-synthesis is genuinely hard, and ML methods trained on PubChem-derived molecules often produce structures that fail retrosynthetic analysis. Synthesisability scores — SAscore (Ertl & Schuffenhauer 2009), SCScore, the various neural-network-based variants — provide a quick numerical estimate that generators can use as a constraint. Retrosynthesis-aware generation (the various 2024–2026 methods) integrates retrosynthesis prediction (Section 7) into the generation loop, producing molecules with verified synthetic routes. The trade-off is between diversity and synthesisability — generators biased toward easy synthesis tend to produce conservative molecules in well-explored chemical space.

Multi-objective optimisation

Real drug-design problems have many simultaneous objectives. Multi-objective generative models use various strategies: weighted-sum scalarisation (combine objectives into a single score, weight tunable), Pareto-front exploration (generate diverse molecules along the trade-off frontier between objectives), constraint-satisfaction (hard constraints on some properties, optimisation on others), and conditional generation with explicit property targets. The methodology connects to the broader multi-objective ML literature; specific drug-discovery deployments use combinations tailored to the discovery program's objectives. Production deployments at AI-native biotechs (Generate Biomedicines, Iambic, the various others) have substantial proprietary methodology in this area.

Scaffold hopping and lead optimisation

Two specific deployment modes worth flagging. Scaffold hopping generates molecules with novel cores while preserving binding interactions — useful for circumventing patents on a known active scaffold. Lead optimisation generates structural modifications of a known lead compound, optimising specific properties (potency, selectivity, ADMET) while preserving the overall pharmacology. The methodology of effective lead optimisation is closer to local search than to global generation, with the model proposing small modifications to be evaluated experimentally.

Retrosynthesis and Reaction Prediction

A drug candidate that cannot be synthesised is not a drug. Retrosynthesis — proposing the synthetic route from commercially-available starting materials to a target molecule — is the central problem-solving discipline of synthetic organic chemistry, and it has been a substantial AI application area since 2017.

The retrosynthesis problem

Given a target molecule, propose a sequence of chemical reactions that would produce it from available starting materials. The methodology is a search problem: at each step, identify possible "disconnections" (retrosynthetic decompositions of the target into simpler precursors), recurse on each precursor until reaching commercially-available compounds. The search space is combinatorially large — even for moderate-complexity drug molecules, the retrosynthetic tree can have hundreds of possible routes — and the difficulty of each route varies substantially with reaction conditions, yields, selectivity, and step count.

Pre-AI retrosynthesis was a dedicated chemistry discipline practiced by experienced medicinal chemists, with computational tools (Synthia, ICSYNTH, the various rule-based programs) providing assistance but not replacement. Modern AI methods have substantially shifted this — the 2017–2022 wave of transformer-based methods produced systems that approach human-expert performance on standard benchmarks, and the 2023–2026 generation has increasingly deployed in production at major pharma.

Forward reaction prediction

The complement of retrosynthesis is forward reaction prediction: given reactants and conditions, predict the products. The 2017 Molecular Transformer (Schwaller et al. 2017–2019, IBM) framed forward prediction as machine translation with reactant-SMILES as source language and product-SMILES as target language. The methodology achieved >90% top-1 accuracy on standard USPTO benchmarks, established the transformer-based methodology as dominant, and produced models that have been substantially extended in subsequent work. RXNFP (reaction fingerprints from Molecular Transformer embeddings) provides reaction-similarity searches; Chemformer and the various 2024 successors extend the methodology with multi-task pretraining.

Retrosynthesis methods

Retrosynthesis is harder than forward prediction because the problem is genuinely one-to-many (there are typically multiple valid disconnections at each step). Two methodological families dominate. Template-based methods (e.g., the original AiZynthFinder approach) extract reaction templates from large reaction databases, score templates against the target, and recursively apply them. The methodology is interpretable and produces routes that match known chemistry, but extends poorly to chemistry types underrepresented in training data. Template-free methods use transformer-based sequence-to-sequence models to predict retrosynthetic disconnections directly from SMILES; the methodology generalises better to novel chemistry but produces less-interpretable disconnections.

The dominant production tool is AiZynthFinder (Genheden et al. 2020, AstraZeneca), an open-source retrosynthesis system combining template-based disconnections with Monte Carlo tree search to navigate the retrosynthetic tree. The methodology has been substantially deployed at major pharma (AstraZeneca, BenevolentAI, the various others) and produces routes that synthetic chemists evaluate as comparable to expert-designed alternatives ~70% of the time. Modern variants (the 2024 release with diffusion-based template scoring, the various 2025 extensions) continue to improve.

The Schwaller-IBM lineage

A particularly influential research lineage worth flagging is Philippe Schwaller's group's work at IBM Research and ETH Zurich. The Molecular Transformer (Schwaller et al. 2019) for forward prediction; the RXN for Chemistry platform (https://rxn.res.ibm.com) deploying retrosynthesis methodology to public users; the various 2023–2026 extensions integrating reaction-condition prediction (which catalyst, solvent, temperature), yield prediction, and increasingly reaction-mechanism prediction. The lineage has substantially shaped the methodology and is a useful entry point for practitioners.

Reaction condition prediction

Beyond predicting which reactants produce which products, modern methods increasingly predict how the reaction is run: optimal catalyst, solvent, temperature, time. Yield prediction from reactants and conditions has been substantially advanced by ML methods, with implications for high-throughput experimentation. The methodology connects to the broader experimental-loop pattern (Section 1) — predicting conditions reduces the experimental search space for synthesis optimisation.

Computer-Aided Synthesis Planning

The integration of retrosynthesis, reaction prediction, and condition prediction into deployable systems is called Computer-Aided Synthesis Planning (CASP — same acronym as Critical Assessment of Structure Prediction, but unrelated). Modern CASP systems (AiZynthFinder, IBM RXN, Synthia from PostEra, Manifold from PostEra/Schrödinger, the various commercial offerings) take a target SMILES and produce ranked synthetic routes with predicted feasibility scores. Production drug-discovery pipelines increasingly use CASP outputs as filters during generative-chemistry screening — molecules without feasible routes get deprioritised before synthesis.

The autonomous lab connection

A specific frontier worth flagging: integration of retrosynthesis with autonomous chemistry labs. The Coley lab at MIT (and increasingly several commercial deployments — Strateos, Emerald Cloud Lab, the various others) operates synthesis robots that take a target SMILES, look up the retrosynthetic route, and physically execute the synthesis with minimal human intervention. The methodology connects retrosynthesis to robotics and process automation; the empirical state of the art in 2026 includes routine multi-step synthesis at small scale with substantially-reduced human chemist effort. The full vision of "specify target → autonomous lab synthesises and tests" is still aspirational but increasingly tractable.

Phenotypic AI: Cell Painting and Beyond

The previous sections developed target-based AI — methods that operate against a specified molecular target. Phenotypic drug discovery takes a different approach: screen compounds for their effects on cells or organisms without specifying a target a priori. The methodology has substantial pre-AI history and is increasingly central to modern AI-driven drug discovery.

Phenotypic vs. target-based discovery

The choice between phenotypic and target-based discovery is methodologically important. Target-based discovery starts with a hypothesis about the target (typically a protein) whose modulation will treat disease, then designs molecules to engage that target. Phenotypic discovery starts with a disease-relevant cellular phenotype (cells from patient tissue, disease-model cell lines, organoid systems) and screens compounds for desired effects on the phenotype, leaving target identification for later or skipping it entirely. Each has trade-offs: target-based discovery has cleaner intellectual structure but depends on target hypotheses being correct (often they aren't); phenotypic discovery is closer to the disease but produces compounds without clear mechanism.

A 2011 Nature Reviews Drug Discovery paper by Swinney & Anthony documented that of 75 first-in-class drugs approved 1999–2008, ~58% emerged from phenotypic discovery while only ~33% emerged from target-based discovery — suggesting phenotypic discovery may be more productive than the target-based dominance of pharma R&D investment would predict. The paper has been substantially debated and updated; the broader point that phenotypic discovery is empirically valuable remains widely accepted.

Cell painting

The dominant modern phenotypic-screening methodology is Cell Painting (Bray et al. 2016 at the Broad Institute). Cells are stained with six fluorescent dyes covering distinct organelles (nucleus, ER, mitochondria, plasma membrane, Golgi, cytoskeleton); high-content microscopy images each cell at multiple wavelengths; the resulting images are processed to extract hundreds to thousands of morphological features per cell (intensity, texture, shape, neighbour relationships). The output is a high-dimensional fingerprint of cellular morphology that responds to compound treatment.

The methodology has produced one of the largest publicly-available phenotypic datasets: the JUMP Cell Painting Consortium (2021–present, multi-institution effort) has imaged ~140,000 chemical and ~15,000 genetic perturbations across multiple cell lines, producing tens of millions of single-cell images with associated profiles. The data is freely available and has become a substrate for substantial ML-based analysis.

Image-based ML for phenotypic screening

AI methods for cell-painting data span several paradigms. Profile-based methods use the extracted morphological features as a fingerprint, applying classical ML (similarity search, classification, clustering) to identify compounds with profiles similar to known reference compounds. Image-based deep learning trains CNNs or vision transformers directly on the raw images, learning representations that often outperform hand-crafted morphological features. Mechanism-of-action prediction uses cell-painting profiles to predict which target a compound likely engages — a "reverse target identification" workflow that complements traditional target-based discovery. Cell-painting foundation models (Recursion's Phenom-1, the various 2024–2026 academic efforts) pretrain large models on cell-painting data and provide reusable embeddings for downstream tasks.

Recursion as a case study

The most-prominent commercial deployment of phenotypic AI is Recursion Pharmaceuticals, an AI-native biotech founded in 2013 around large-scale cell-painting screening combined with ML. Recursion has produced one of the largest proprietary phenotypic-imaging datasets (multiple petabytes by 2024), built foundation models on top of it, and advanced multiple AI-discovered candidates into clinical trials. The 2023 partnership with Roche and 2024 acquisition of Exscientia substantially expanded its scope. Recursion's empirical record will be a major data point in the broader question of whether AI-native drug discovery succeeds at higher rates than traditional approaches.

Patient-derived models and organoids

A specific frontier is the use of patient-derived cell models — cells obtained directly from patients with the target disease, sometimes induced-pluripotent-stem-cell-derived (iPSC), sometimes organised into 3D organoid structures that recapitulate tissue-level biology. The methodology promises substantially more disease-relevant phenotypic screening than the immortalised cell lines (HeLa, HEK293, U2OS) that dominate cell-painting datasets. AI methods for organoid imaging, single-cell phenotyping in patient-derived models, and the integration with patient genomic data are active research areas with substantial overlap with Ch 06's single-cell methodology.

Connection to Ch 06's biology methods

Phenotypic drug discovery sits at the intersection of Ch 06 (AI for Biology & Genomics) and this chapter. Single-cell perturbation screens (Perturb-seq, the various variants) provide molecular-level phenotypic data complementing cell-painting's morphological data. Ch 06's perturbation-prediction methods (CPA, GEARS, the various foundation models) are increasingly deployed in drug-discovery contexts. The 2024–2026 wave of multi-modal phenotypic AI integrates imaging, molecular profiling, and physiological readouts into unified pipelines. The methodology represents one of the most-promising frontiers for AI-driven drug discovery to actually reduce attrition rates.

Industry Deployment and Empirical Validation

The previous sections developed the methodology. This section turns to the empirical record: which AI-discovered drugs have reached clinical trials, how they've performed, what the AI-native biotech landscape looks like in 2026, and what the early data is saying about whether AI methods change drug-discovery outcomes.

The AI-native biotech landscape

The AI-native drug-discovery sector includes substantial commercial activity. Recursion (founded 2013, IPO'd 2021, acquired Exscientia 2024) — phenotypic-screening focus with substantial pipeline. Insilico Medicine (founded 2014) — generative chemistry and target discovery with several candidates in clinical trials by 2025. Exscientia (founded 2012, IPO'd 2021, merged with Recursion 2024) — pioneered the "AI-designed molecule in clinical trials" milestone with DSP-1181 in 2020. BenevolentAI (founded 2013) — knowledge-graph-based target discovery. Atomwise (founded 2012) — virtual screening with deep-learning docking. Generate Biomedicines (founded 2018, Flagship-incubated) — protein-and-small-molecule generative platforms. Iambic Therapeutics (founded 2020) — physics-grounded molecular AI. Isomorphic Labs (DeepMind spin-out, 2021) — protein structure to drug discovery. Cradle Biosciences (founded 2021) — protein-design focus. Xaira Therapeutics (founded 2024, well-funded) — broad AI-discovery platform.

Beyond the AI-native biotechs, every major pharma company has substantial internal AI-drug-discovery efforts and partnerships. Roche-Recursion, Sanofi-Insilico, Bayer-Exscientia (now Recursion), Merck-Aitia, AstraZeneca-BenevolentAI, the various others. The scope of pharma-AI integration has grown substantially since 2020.

The empirical record so far

By 2026, several AI-discovered drugs have reached clinical trials. DSP-1181 (Exscientia, for OCD, Phase I 2020) was the first widely-recognised "AI-designed" molecule to enter human trials; it was discontinued after Phase I for unrelated business reasons. INS018_055 (Insilico, for idiopathic pulmonary fibrosis, Phase II 2024) is a more substantial test — primary endpoints reported mixed results in late 2024. REC-2282 (Recursion, for neurofibromatosis Type 2, Phase III) reached late-stage trials. EXS-21546 (Exscientia, for cancer immunotherapy) also reached Phase II by 2024. The 2025–2026 wave of AI-discovered candidates entering trials is substantially larger; the empirical base for evaluating whether AI changes drug-discovery outcomes is growing rapidly.

The 2024 wave of academic analyses of AI-discovered-drug success rates has been mixed. Some analyses suggest AI methods improve early-stage hit rates and reduce time-to-candidate; others note that the broader Phase II and Phase III attrition rates haven't yet shifted, suggesting AI may be accelerating the process without changing its fundamental success rate. The empirical question — does AI improve clinical-trial success rates? — remains genuinely unanswered as of 2026 but will be substantially clarified by 2028.

Where AI methods are clearly winning

Several specific applications have produced clear empirical wins by 2026. Hit rates from virtual screening: AI-driven virtual screening of large libraries routinely produces hit rates 10–100× higher than random selection, with substantial reduction in experimental compound consumption. ADMET filtering: ML-based ADMET predictors are deployed at every major pharma company, with substantial measurable reduction in attrition at preclinical stages. Lead optimisation cycle time: AI-augmented lead optimisation (combining generative chemistry, ADMET prediction, and AlphaFold-based docking) has reduced lead-optimisation timelines from 2–3 years to 6–12 months at well-deployed sites. Synthesis route generation: AiZynthFinder and similar tools have substantially reduced the chemist effort required for retrosynthetic planning. Target identification: ML-based analysis of human genetics, single-cell data, and pathway databases (Ch 06) has produced new drug-target hypotheses that are advancing through preclinical pipelines.

The integration question

A specific tension in the field as of 2026: are AI-native biotechs displacing traditional pharma, or is the methodology becoming standard infrastructure? The 2024–2026 partnership and acquisition wave suggests integration. Roche's ~$30B partnership with Recursion (2023), Sanofi's collaborations with multiple AI biotechs, the various big-pharma acquisitions of AI-native companies (the 2024 Exscientia-Recursion merger, the various others) point toward AI methodology becoming standard pharma infrastructure rather than a competitive advantage held by specific companies. The empirical implication: methodology that's available to everyone produces less differentiation than methodology held by specific companies — which is consistent with the "AI accelerates everyone equally" hypothesis above.

Pharma-internal vs. AI-native methodology

A subtle but important distinction: pharma-internal AI deployment differs substantially from AI-native-biotech deployment. Pharma deploys AI methods within existing experimental-and-clinical infrastructure that has 50+ years of internal data, methodology, and institutional knowledge — AI is one tool among many. AI-native biotechs deploy AI as the core differentiator with experimental and clinical capabilities built around it. The two approaches have produced different results so far: pharma-internal AI shows clear incremental productivity gains; AI-native biotechs have produced more headline-grabbing molecules but less clear evidence of overall pipeline-productivity advantages. The methodology may converge as AI-native biotechs build experimental scale and pharma incorporates more AI-native methodology.

Open data, proprietary models, competitive moats

A specific structural pattern: most foundational AI-for-drug-discovery methodology (AlphaFold, ESM, the various generative-chemistry methods) is published openly, while production deployments increasingly depend on proprietary data and models. The competitive moat for specific companies is increasingly the data infrastructure (proprietary chemistry-and-biology datasets, internal experimental pipelines, accumulated SAR knowledge) rather than the underlying ML methodology. This is methodologically interesting because it means the public methodology will continue to advance rapidly while production differentiation lives in the data and integration layers.

The Frontier and the Pharma Integration Question

The previous sections developed the established methodology. This final section turns to the open frontiers and the central strategic question for the field: how AI methods integrate with the broader pharma industry over the next several years.

The active research frontier

Several frontier directions are particularly active in 2026.

Multi-modal foundation models — combining molecular, protein, biological-pathway, and clinical data into unified architectures. The 2024–2026 wave of "molecule-and-protein-and-everything" foundation models (Boltz-1, the various academic and industrial offerings) is producing substantial empirical gains over single-modality approaches, particularly for protein-ligand binding and complex-structure prediction.

Generative protein-ligand co-design — designing both a protein binding pocket and a ligand simultaneously, allowing for de novo therapeutic-protein design with custom small-molecule binding. The methodology integrates Ch 04's protein-design tools (RFdiffusion) with this chapter's molecular-design tools, with substantial implications for therapeutic-protein engineering.

Active learning at scale — systematic deployment of model-experiment feedback loops that drive synthesis decisions. The methodology has been used at moderate scale for years; the 2024–2026 wave is deploying at much larger scale (millions of compounds per cycle) with substantial improvements in efficient experimentation.

Autonomous discovery platforms — full-pipeline systems that combine generative chemistry, retrosynthesis, autonomous synthesis, and high-throughput screening with minimal human intervention. The 2024–2026 demonstrations (the Coley lab's autonomous platforms, several commercial offerings) have produced first-pass closed loops; the 2026–2028 frontier is making these systems robust enough for routine production use.

Clinical AI — applying ML methods to clinical-trial design, patient stratification, biomarker development, and AI-derived efficacy endpoints. The methodology faces substantially harder regulatory challenges than early-stage discovery but has potentially larger impact (Phase II/III trials are where the bulk of attrition lives).

Open problems

Several genuinely-open problems will likely shape the next several years.

Activity cliff handling — most ML methods systematically mispredict the most-important compounds in chemical-series optimisation. Better handling of activity cliffs would substantially improve practical utility.

Out-of-distribution generalisation — designs from chemical space far from training distribution often fail in unexpected ways. Methods that quantify and manage prediction uncertainty in OOD regimes are needed.

Validation rigour — published benchmark numbers consistently overstate real prospective performance. The methodology of publication-quality evaluation needs continued refinement.

Regulatory acceptance of AI-derived clinical evidence — the boundaries of what AI methods can produce as primary clinical evidence (vs. supportive evidence) are still being established, with substantial implications for what AI can deliver in drug development beyond the early stages.

What 2028 might look like

Speculative but useful. By 2028, several developments seem likely. AI-discovered drugs will be in increasing numbers in late-stage trials; the empirical record will be substantial enough to evaluate whether AI changes Phase II/III success rates. Multi-modal foundation models will be standard infrastructure. Autonomous discovery platforms will be operating at moderate scale at several major sites. Pharma-AI integration will be sufficiently mature that the "AI-native biotech vs. traditional pharma" framing is replaced by "pharma companies of various sizes with various levels of AI integration." The regulatory framework for AI in drug development will be substantially more mature, with established pathways for AI-derived clinical evidence and AI-driven regulatory submissions.

The specific question of whether AI improves clinical success rates remains the open empirical question that the next several years will answer. If yes, the economic implications are substantial — drug-development costs reduce by perhaps 30–50%, prices on certain disease classes drop, more diseases become economically tractable, the pharma industry restructures around the new productivity. If no, the AI-native biotech sector consolidates into pharma's standard methodology toolkit without producing the dramatic productivity gains its valuations have priced in. Either outcome is methodologically interesting; the field's trajectory depends substantially on which materialises.

What this chapter does not cover

Several adjacent areas are out of scope. The substantial process-chemistry and chemical-engineering literature on scaling drug synthesis from milligram to ton scale is its own discipline, with its own AI applications (process optimisation, yield prediction, the various others). The pharmaceutical commercial-and-marketing operations that translate approved drugs into prescriptions and revenue are skipped. The broader healthcare-AI material on clinical decision support, hospital operations, and electronic health records is in Ch 05 of Part XIV (Healthcare & Clinical AI) rather than here. The substantial regulatory science around clinical-trial design, real-world evidence, and post-market surveillance is touched only briefly. The methodology developed here is the practical AI-for-drug-discovery discipline focused on early-stage chemistry; the broader pharma landscape is genuinely vast.