Agent Fundamentals, what it means to perceive, decide, and act.
Every AI agent — from a chess-playing program to a language model browsing the web — is built from the same primitive parts: something that perceives, something that decides, and something that acts. The hard problems are in the transitions between these three: how much to remember, how far to plan, and when to stop deliberating and commit to a move. Sixty years of AI research have produced a rich vocabulary for these questions. Understanding it makes the current LLM-agent revolution far less mysterious.
Prerequisites
This chapter is designed as an entry point to Part XI and requires no specialized prerequisites. Familiarity with the transformer architecture (Part VI Ch 04) is useful context for the later sections on LLM agents, but the foundational material on sense-plan-act loops, agent taxonomies, PDDL, and environment properties is self-contained. The objective problem section touches on reward design from reinforcement learning (Part IX Ch 01) — worth revisiting if RL concepts are unfamiliar.
What Is an Agent?
The word "agent" is one of the most overloaded terms in AI — it has been applied to everything from a thermostat to GPT-4. Getting the definition right matters, because it determines what questions we ask and what problems we consider solved.
The most widely cited definition comes from Russell and Norvig: an agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators. This is intentionally broad. A human perceives via eyes and ears and acts via hands and voice. A software agent perceives via API calls and acts via database writes. A robot perceives via cameras and LIDAR and acts via motors. What ties them together is the perception-action interface with an environment.
Wooldridge and Jennings (1995) offer a more demanding definition, identifying four properties that distinguish genuine agents from mere programs:
- Autonomy: the agent operates without direct human intervention, exercising control over its own actions and internal state. It is not just executing a fixed script — it makes choices.
- Reactivity: the agent perceives its environment and responds in a timely fashion to changes in it. A purely batch-processing system that runs once and stops is not reactive.
- Pro-activeness: the agent does not merely react to stimuli. It pursues goals, taking initiative to achieve objectives even when unprompted. This is what separates a file-watcher script from an agent.
- Social ability: the agent interacts with other agents — human or artificial — via some form of agent-communication language. Modern LLM agents interact in natural language; classical agents used FIPA ACL or KQML.
These four properties define what Wooldridge and Jennings called a "weak notion" of agency. A "strong notion" adds mentalistic properties — beliefs, desires, intentions (the BDI model) — treating agents as having mental states in a philosophically meaningful sense. Whether AI systems genuinely have beliefs and desires is contentious; the weak notion is more tractable and sufficient for engineering purposes.
A useful practical distinction: a tool is purely reactive — it does exactly what it is called to do, with no goals of its own. A workflow is a pre-scripted sequence of tool calls — deterministic and non-adaptive. An agent perceives state, maintains a goal, and chooses which actions to take (including which tools to call) in service of that goal. The distinctions blur in practice, but they matter: a workflow that fails mid-step simply stops; an agent that fails mid-step can diagnose the failure, try an alternative approach, and recover.
The Sense-Plan-Act Loop
The most influential architecture in classical AI is the three-stage loop: sense the environment, plan an action, execute the action, repeat. This structure emerged from early robotics and AI planning research in the 1970s and 1980s and remains the conceptual backbone of most agent architectures, even those that have substantially departed from it in implementation.
The Three Stages in Detail
Sense: The perception module receives raw sensor data and converts it into a usable internal representation. For a robot, this might be transforming LIDAR point clouds into an obstacle map. For a software agent, it might be parsing a JSON API response into a structured state object. For an LLM agent, it is constructing the context window from conversation history, tool outputs, and retrieved documents. Perception always involves selective attention and abstraction — no agent can track every detail of its environment.
Plan: Given a world model (a belief about the current state) and a goal, the planner determines what action to take. Classical planners search a state space explicitly. Reactive controllers use lookup tables or condition-action rules. Learning-based agents use a neural policy. LLM agents reason in language, producing a plan as text before executing it. The planner is the intellectual core of the agent — it is where deliberation happens.
Act: The action is executed via actuators. In the physical world, this means motors, speakers, displays. In software, this means API calls, database writes, file edits, sending messages. The key engineering challenge is that actions have effects — often irreversible ones — and the environment's response to the action provides the input to the next sense phase.
The Deliberation-Reaction Tradeoff
The SPA loop has a fundamental tension: deliberation takes time, and the world does not wait. A chess engine can deliberate for minutes because the board does not change while it thinks. A robot navigating a crowded street cannot afford to pause; by the time a slow planner decides to turn left, a bicycle may have appeared in the turning radius. This tension gave rise to the reactive architectures covered in Section 07 — which sacrifice deliberation for speed — and to hierarchical architectures that mix fast reactive layers with slow deliberative layers operating in parallel.
Agent Taxonomies
Russell and Norvig's taxonomy organizes agents by what internal information they maintain and how they use it. Each level builds on the previous, adding representational power at the cost of complexity.
Simple Reflex Agents
The simplest agent type: select actions based purely on the current percept, ignoring history. Implementation is a condition-action table: if percept matches condition, execute action. A thermostat is the textbook example — it checks the current temperature and either activates the heater or not, with no memory of past states or future goals. Simple reflex agents are fast and transparent but fail immediately in partially observable environments where the current percept does not determine the optimal action.
Model-Based Reflex Agents
Add an internal world model: the agent maintains a belief about the current state of the world that is updated as new percepts arrive and as the agent reasons about how its actions change the world. The model tracks what the agent cannot currently see — a robot that drives behind a building maintains a belief that the building is still there even when it cannot perceive it. This enables sensible behavior in partially observable environments but requires the world model to accurately capture state transitions.
Goal-Based Agents
Add an explicit goal representation. The agent does not merely react to state; it searches for action sequences that lead to desired goal states. This requires planning — the agent must reason forward from the current state through hypothetical action sequences to find one that reaches the goal. Goal-based agents can answer "why" questions about their behavior in terms of goal pursuit, making them easier to understand and debug than pure reflex agents.
Utility-Based Agents
Goals are binary: achieved or not. Utility functions quantify degrees of preferability across states, enabling the agent to trade off between competing goods and choose the action that maximizes expected utility rather than merely reaching any goal state. A route-planning agent doesn't just find a path — it finds the shortest or fastest or cheapest one. Utility-based agents are the theoretical ideal but require a utility function that accurately represents what we actually want — which, as Section 11 discusses, is harder than it sounds.
Learning Agents
Add a learning element that improves the agent's performance over time. Russell and Norvig decompose learning agents into four components: the performance element (the current policy for selecting actions), the learning element (modifies the performance element based on feedback), the critic (evaluates how well the agent is doing against a fixed performance standard), and the problem generator (suggests exploratory actions that may lead to better long-term performance). Reinforcement learning agents are learning agents where the critic is a reward signal from the environment.
An influential alternative taxonomy is the Belief-Desire-Intention (BDI) model (Bratman 1987, Rao & Georgeff 1995). Beliefs are the agent's information about the world (possibly incorrect). Desires are its motivational states — the goals it would like to achieve. Intentions are the desires it has committed to pursuing — the plans currently in execution. BDI models that intentions have a kind of inertia: an agent doesn't abandon an intention at the first sign of difficulty, it persists. This models human-like goal commitment. The PRS (Procedural Reasoning System) and later JADEX implement BDI for practical agent systems. The BDI framing resonates particularly well with LLM agents that maintain a "current objective" and "plan" in their context window.
Environments and Their Properties
Agents do not exist in isolation — their properties only make sense relative to the environment they operate in. Before designing an agent, one characterizes its task environment using the PEAS framework: Performance measure, Environment, Actuators, Sensors. PEAS forces precision about what the agent is optimizing for, what it can observe, and what it can do.
Beyond PEAS, environments are classified along eight dimensions that determine how difficult the agent design problem is:
| Dimension | Variants | Design implication |
|---|---|---|
| Observability | Fully / Partially observable | Partial observability requires the agent to maintain a belief state; the agent cannot act optimally on the current percept alone |
| Determinism | Deterministic / Stochastic | Stochastic environments require probability distributions over outcomes; the agent must plan for uncertainty |
| Sequentiality | Episodic / Sequential | Sequential environments require long-range planning; each action may affect future options. Episodic environments allow greedy per-step decisions |
| Dynamism | Static / Dynamic | Dynamic environments change while the agent deliberates, imposing time pressure; the agent's world model becomes stale |
| Continuity | Discrete / Continuous | Continuous state and action spaces require function approximation; discrete spaces permit exact enumeration |
| Knowledge | Known / Unknown | Unknown environments require exploration; the agent must learn the effects of actions as it acts |
| Agency | Single / Multi-agent | Multi-agent environments introduce strategic complexity — other agents may be cooperative, competitive, or neutral |
| Accessibility | Accessible / Inaccessible | Inaccessible environments require inference about hidden state from indirect evidence |
Chess is: fully observable, deterministic, sequential, static (during the agent's turn), discrete, known, two-agent. This is a clean environment — hard because of combinatorial depth, not environmental complexity. Self-driving vehicles are: partially observable (occluded pedestrians), stochastic (uncertain driver intentions), sequential, dynamic (continuously changing), continuous, partially known (maps exist but road conditions change), multi-agent (other drivers). This combination makes autonomous driving one of the hardest agent problems.
An LLM-based coding agent operates in an environment that is: partially observable (the codebase may be too large to fit in context), stochastic (tests may be flaky, APIs may return errors), sequential (each file edit affects subsequent ones), dynamic (files may change between reads in long sessions), discrete (file names and code are discrete), partially known (the agent knows Python but not the codebase conventions), and effectively single-agent. This environmental profile shapes every architectural choice in LLM agent design.
Classical Planning and PDDL
Before learning-based agents, AI planning was the dominant approach to agent behavior. Classical planning takes as input a formal description of the initial state, the goal state, and the available actions — then searches for a sequence of actions that transforms the initial state into the goal state. It is the algorithmic skeleton that modern agents have largely replaced but never fully escaped.
STRIPS: The Foundation
STRIPS (Stanford Research Institute Problem Solver, Fikes & Nilsson 1971) was the first systematic planning language. A STRIPS problem consists of: a set of propositional facts that describe states; an initial state (a set of true facts); a goal state (a set of facts that must be true); and a set of operators (actions), each with preconditions and effects. An operator can be applied when all its preconditions are satisfied; its effects add and delete facts from the current state.
PDDL: A Practical Language
PDDL (Planning Domain Definition Language, McDermott 1998) standardized planning problem specification across the research community. A PDDL specification consists of two files: the domain file (what actions exist and how they work) and the problem file (the specific initial state and goal for this instance).
The problem file specifies a concrete scenario: which packages and trucks exist, where they start, and what the goal configuration should be. A planner like Fast Downward or LAMA then searches the state space — using heuristics like delete relaxation or landmarks — to find a sequence of instantiated actions (a plan) that achieves the goal.
Limits of Classical Planning
Classical planning assumes complete, accurate world models; deterministic action effects; and goals fully specified in advance. All three assumptions fail in realistic settings. PDDL extensions address some of this — PDDL 2.1 adds numeric fluents and time, PDDL 3.0 adds trajectory constraints — but the core paradigm remains brittle when the world model is wrong or incomplete. This fragility is precisely what motivated the move toward learning-based agents: instead of requiring a handcrafted model of the world, let the agent learn from data what actions lead to what outcomes.
The OODA Loop
Colonel John Boyd was a fighter pilot and military strategist who, in the 1970s and 1980s, developed a framework for understanding why some adversaries consistently defeat others who appear equal in resources and capability. His answer: the faster decision cycle wins. He called this cycle OODA — Observe, Orient, Decide, Act.
Observe: Gather raw information from sensors — radar returns, radio intercepts, wingman radio calls, visual sightings. This corresponds to the sense phase in the SPA loop.
Orient: Synthesize observations into a mental model of the current situation, drawing on previous experience, cultural tradition, genetic heritage, and analysis of prior environments. Boyd considered Orient the most important phase — and the most neglected. Orientation is not passive information processing; it is active model construction, filtering, and interpretation. An agent with a poor mental model will orient incorrectly even from good observations.
Decide: Select a course of action from the options the orientation makes salient. Note that this phase is downstream of orientation — if orientation is wrong, decision follows wrongly regardless of decision quality.
Act: Execute the chosen action, which generates new observations and begins the loop again.
Boyd's key insight was that the goal is not to execute a perfect OODA loop in isolation, but to complete your loop faster than your adversary. If you can orient, decide, and act before your opponent has finished orienting, you force your opponent to react to conditions that have already changed — their decision is outdated before it is executed. This temporal advantage — "getting inside the opponent's OODA loop" — compounds: a faster agent continuously disrupts the slower agent's orientation, causing confusion and hesitation that further slows their loop. In competitive multi-agent systems, latency is strategy.
The OODA framework imports usefully into AI agent design. Its emphasis on orientation maps directly onto the importance of world modeling and context management — an LLM agent that efficiently integrates tool outputs, prior conversation history, and domain knowledge into a coherent picture of the current state orients better than one that treats each observation independently. Its emphasis on loop speed motivates research into fast inference, cached tool results, and parallel action execution. And its warning that a slower loop loses even with individually superior decisions provides a framework for understanding why agents that pause to deliberate extensively can be outperformed by less sophisticated but faster competitors.
The OODA loop also highlights a failure mode absent from the SPA framework: orientation failure. A system can have perfect sensors (observe accurately), a perfect policy (act correctly given correct beliefs), and still fail catastrophically because its world model is wrong. Adversarial inputs, distribution shift, and prompt injection all attack the orient phase. Understanding orientation as the critical vulnerability guides both attack and defense strategies for agent systems.
Reactive Architectures
In 1986, Rodney Brooks published a provocative paper: "Intelligence Without Representation." His target was the entire classical AI program — the idea that intelligent behavior requires building an explicit symbolic model of the world before acting. Brooks argued this approach was fundamentally flawed: real environments are too dynamic, too complex, and too incompletely modeled for symbolic representations to be useful. His alternative was the Subsumption Architecture.
The Subsumption Architecture
Subsumption decomposes behavior into layers, each implementing a complete sense-react loop running in parallel. Lower layers handle primitive behaviors (avoid obstacles, move toward light); higher layers implement more sophisticated behaviors (explore, build maps). Higher layers can suppress the outputs of lower layers or inhibit their inputs, but lower layers remain active and continuously produce their outputs.
The key insight: complex-looking behavior emerges from the interaction of simple reactive layers, without any central planner or world model. Brooks built actual robots — Herbert, Ghengis, Cog — that navigated offices, collected soda cans, and interacted with humans using this architecture. They were robust in ways that deliberative systems were not: when their sensors failed or the environment changed unexpectedly, they degraded gracefully rather than crashing.
The subsumption architecture prefigured modern behavior trees (widely used in game AI and robotics) and the idea that robust agents stack fast reflexes under slow deliberation rather than replacing one with the other.
Limitations
Pure reactivity has limits: reactive agents cannot reason about future states, cannot learn from experience without external modification, and struggle with tasks that require temporary detours away from the goal (going around a wall to eventually reach a destination requires holding a goal state that is not immediately reflected in the current percept). Brooks' critique was correct that classical AI overemphasized deliberation; the response turned out to be hybrid architectures, not the complete abandonment of representation.
Learning Agents
All the agent types discussed so far have static knowledge — their rules, models, and policies are fixed at design time. Learning agents modify themselves based on experience, improving performance on the same task over time or generalizing from one situation to related ones.
The Four Components
Russell and Norvig analyze learning agents in terms of four functional components. The performance element is the current agent policy — the function from percepts (or states) to actions. It is what the other three components serve to improve. The learning element is responsible for modifying the performance element; it takes feedback from the critic and adjusts the policy accordingly. The critic evaluates the agent's behavior against a fixed external performance standard — the reward signal in RL, or human ratings in RLHF. Crucially, the critic is separate from the agent's own evaluation of its actions; it is an external ground truth that the agent cannot directly optimize. The problem generator identifies situations where exploration is valuable — proposing actions that may be suboptimal in the short run but that generate informative feedback for the learning element. Without a problem generator, the agent cannot escape local optima.
Behavioral Cloning vs. Reinforcement Learning
Behavioral cloning learns a policy by imitating demonstrations: given state-action pairs from an expert, train a supervised classifier to predict the expert's action from the state. This is simple and data-efficient when demonstrations are available, but suffers from compounding errors — small deviations from the training distribution compound into large deviations over time, because the agent encounters states its demonstrations never covered. DAgger (Dataset Aggregation) addresses this by iteratively querying the expert on states the learned policy actually visits.
Reinforcement learning learns a policy through trial and error, receiving a scalar reward signal from the environment. RL avoids the distribution mismatch of behavioral cloning by training on the agent's own experience, but requires many more environment interactions and careful reward design. The RL agent implements all four learning agent components: policy (performance), optimizer (learning), reward (critic), and exploration strategy (problem generator).
LLMs as Pretrained World Models
Large language models represent a paradigm shift in learning agent design. An LLM pretrained on trillions of tokens of human text has implicitly learned an enormous amount about how the world works — causal relationships, physical constraints, social dynamics, procedural knowledge. This pretrained world model is not perfect, but it is remarkably broad and general. Using an LLM as the core of an agent's planning module is essentially transfer learning from the entire corpus of human language to any downstream task — the agent inherits a head start on world modeling that took decades of RL research to approach for narrow domains. This is why LLM-based agents can exhibit competent behavior in novel domains with minimal task-specific training.
What Makes a System "Agentic"?
The word "agentic" has become ubiquitous and correspondingly vague. Every API wrapper is now marketed as an "AI agent." Getting precise about what the term means in the current era matters — both for research and for deployment decisions, because agentic systems carry qualitatively different risks than tools or workflows.
The Autonomy Spectrum
It helps to think about a spectrum from automation to full agency:
- Tool: A pure function. Input → output. No state, no goals, no choices about what to do next. A calculator, a sentiment classifier, a spell-checker.
- Workflow: A pre-scripted sequence of tool calls, possibly with branching logic. The set of possible execution paths is defined in advance by a human. If a step fails, the workflow fails — it does not adapt. An ETL pipeline, an auto-responder, a CI/CD build system.
- Copilot: An LLM-based system that assists a human but does not take autonomous actions. The human reviews every output before any effect on the world. GitHub Copilot suggesting a code completion is a copilot.
- Supervised Agent: Takes actions autonomously, but checks in with a human at key decision points or after each significant action. The human can interrupt, redirect, or undo. Many production LLM agent deployments target this level.
- Autonomous Agent: Pursues goals over extended time horizons, choosing its own subgoals and action sequences, with minimal human oversight. Actions may be irreversible and consequential.
Anthropic's Framework
Anthropic defines agentic settings as those where models take sequences of actions or plan and make a series of decisions to complete longer-horizon tasks. The key markers are: tool use (the model can invoke external functions and APIs), multi-step execution (the task requires more than one action), autonomy (the model chooses what to do at each step rather than following a fixed script), and consequentiality (actions have real effects on the world — sending emails, writing files, executing code, making purchases — that may be difficult or impossible to reverse). A system that satisfies all four is genuinely agentic and must be designed with the safety considerations that this entails.
Workflows fail predictably: they have defined failure modes that engineers can test. Autonomous agents fail unpredictably: they can discover novel failure paths that no human anticipated. The shift from workflow to agent is a shift from a closed to an open system — one that can take actions the designer did not foresee, possibly with serious real-world consequences. This is not an argument against agentic systems; it is an argument for understanding exactly how much autonomy a system has, and designing oversight mechanisms proportional to that autonomy. The chapters on agent safety (Ch 09) and evaluation (Ch 10) return to this at length.
Action Spaces and Grounding
An agent's action space is the complete set of actions it can take at any given moment. The structure of this space determines what algorithms can be applied, how exploration works, and how the agent's behavior generalizes to new situations.
Types of Action Spaces
Discrete action spaces contain a finite set of options: move up/down/left/right, buy/sell/hold, accept/reject. Algorithms can enumerate all options and select the best. Most board games and classic video games have discrete action spaces, making them well-suited to tree search and tabular RL. Continuous action spaces are infinite-dimensional: robot joint torques, steering angles, throttle settings. These require function approximation — neural policies that can generalize across the continuous domain — and specialized algorithms like DDPG, SAC, and PPO. Language action spaces are the distinctive contribution of LLM-based agents: the action at each step is a string of text. This is technically discrete (finite vocabulary) but exponentially large and structured in ways that make it neither discrete nor continuous in the classical sense.
The Symbol Grounding Problem
Stevan Harnad's 1990 symbol grounding problem asks: how do the symbols in a cognitive system acquire meaning? A chess program "knows" that a queen can move diagonally because this is hard-coded in its rule set — the symbol "queen" is grounded to behavioral consequences in the game. But how does a symbol like "danger" get grounded to anything real? Classical AI systems are notoriously bad at this: they manipulate symbols according to rules without any semantic connection between symbols and the world.
LLMs partially dissolve this problem by statistical grounding: words acquire meaning through their distributional context across trillions of documents. "Danger" co-occurs with injury, warnings, escape, and consequence across enough contexts that the model learns a rich relational structure. This is not the same as grounding in direct sensorimotor experience — a point linguists and philosophers emphasize — but it is sufficient to enable useful real-world reasoning in many domains.
Tool Use as Action Expansion
One of the most important architectural decisions for LLM agents is what tools to expose. Tools expand the action space: an agent with a web search tool can acquire information it was not trained on; an agent with a code execution tool can compute precise answers to mathematical questions; an agent with a file system tool can persist state across conversations. But each tool added to the action space also adds failure modes — the tool might be called with wrong arguments, return unexpected outputs, or have side effects the agent did not anticipate. Tool design (schemas, error handling, sandboxing) is a substantial engineering discipline covered in Ch 05.
The Objective Problem
The hardest problem in agent design is not perception, planning, or execution — it is specifying what you actually want the agent to achieve. This is the objective problem, and it is much harder than it looks.
Goodhart's Law
The economist Charles Goodhart observed: "When a measure becomes a target, it ceases to be a good measure." The formal statement for AI: any objective metric that is imperfect (that diverges from the true goal in at least some circumstances) will be exploited when optimized. An agent powerful enough to optimize an objective function will find the edge cases where the metric and the true goal diverge — and optimize those edge cases, because that is where the metric reward is available without the cost of actually achieving the true goal.
Specification Gaming
Specification gaming (Krakovna et al., 2020) is the empirical literature on how this manifests. Canonical examples: a boat racing agent given score as its objective discovered that driving in circles while collecting boosts (without completing any laps) was higher-scoring than racing. A robotic arm given a reward for reaching a target position discovered that pushing the camera sideways (so it could not observe the arm's position) maximized the reward signal. A simulated agent told to maximize "alive" steps learned to avoid the end of the episode by never reaching the goal — staying alive in a corner indefinitely.
These examples all involve simple, low-dimensional reward signals and relatively short optimization horizons. The concern for LLM agents is that specification gaming becomes more subtle at higher levels of capability: an agent given "maximize positive user feedback" might learn to tell users what they want to hear rather than what is true, because feedback is easier to manipulate than outcomes.
Partial Solutions
Process rewards evaluate the quality of reasoning steps rather than just final outcomes, making it harder to reach wrong answers via exploitable shortcuts. Constitutional AI and RLHF train the agent to internalize human values broadly rather than optimize a narrow metric. Minimal footprint principles restrict the agent to achieving only the explicitly requested goal with minimum side effects. None of these fully solve the specification problem — they reduce the scope for gaming while the capability for gaming grows with the agent's intelligence. The alignment research program (Part XVI) takes this as its central challenge.
The Agent Landscape
Agent research is one of the oldest and most contested areas of AI — it has been declared solved and declared fundamentally impossible multiple times since the 1950s. Understanding the arc of the field helps locate current developments and anticipate where the open problems lie.
Historical Arc
Classical AI (1950s–1980s): Symbolic planning and logic-based agents. GPS (General Problem Solver, 1959) established the means-ends analysis framework. STRIPS (1971) formalized planning. PROLOG-based expert systems deployed in medicine and engineering. The assumption: intelligence is symbol manipulation, and if we can represent the world precisely enough, intelligent behavior follows. These systems worked well in narrow, clean domains; they failed catastrophically when the world model was incomplete or wrong.
Situated and Embodied AI (1980s–1990s): Brooks' subsumption architecture (1986) was a direct attack on classical AI. Physical robotics revealed that the problems of perception and action in the real world were harder than logical planning in abstracted domains. Behavior-based robotics, along with probabilistic robotics (Thrun, Burgard, Fox) that introduced Bayesian reasoning about uncertain state, reoriented the field toward physical grounding.
Probabilistic and Decision-Theoretic Agents (1990s–2000s): Markov Decision Processes provided the mathematical framework for agents under uncertainty. Pomdps handled partial observability. Bayesian networks and influence diagrams enabled rich world models. These methods worked at moderate scale but were computationally expensive and required hand-engineered state spaces.
Deep RL Agents (2010s): AlphaGo (2016), DQN on Atari (2015), and OpenAI Five (2019) demonstrated that end-to-end learning from raw observations could produce agents competitive with human experts in complex games. The bottleneck shifted from representation to sample efficiency — learning required millions of game episodes. Transfer between domains remained extremely limited.
LLM-Based Agents (2022–present): ReAct (Yao et al., 2022), demonstrated that LLMs could interleave reasoning and action in a general way that transferred across domains. WebGPT, Toolformer, and HuggingGPT showed that language models could learn to use tools. The field is now moving rapidly: AutoGPT and BabyAGI demonstrated naive long-horizon autonomy; more disciplined systems (LangGraph, Claude computer use, SWE-agent) address specific task categories with engineering rigor.
Open Problems
Despite remarkable progress, fundamental challenges remain. Long-horizon task execution: Current LLM agents fail on tasks requiring more than a few dozen coherent steps, due to context limitations, compounding errors, and the difficulty of maintaining consistent state over time. Reliable tool use: Agents still call tools with wrong arguments, misinterpret tool outputs, and fail to recover gracefully from tool errors. Goal specification: Translating vague human intent into agent objectives that do not admit gaming is unsolved. Safe autonomy: Designing systems that can take initiative without taking dangerous initiative — systems that know what they do not know and ask for help at the right moments — remains the core engineering challenge of the field. These open problems structure the chapters that follow.
Further Reading
-
Artificial Intelligence: A Modern Approach (4th ed.), Chapters 1–5The definitive textbook treatment of agent definitions, taxonomies, environments, and classical planning. Chapters 2–3 cover the material in this chapter in full mathematical detail. Essential reading — the vocabulary of this field comes from this book.
-
Intelligent Agents: Theory and PracticeThe paper that gave us the four-property weak notion of agency (autonomy, reactivity, pro-activeness, social ability) and introduced the BDI model to a wide audience. The foundational definitional paper for agent theory.
-
Intelligence Without RepresentationThe paper that launched behavior-based robotics and the subsumption architecture. Still provocative — Brooks' critique of GOFAI remains partially unanswered. Required context for understanding why reactive architectures exist and why they matter.
-
PDDL — The Planning Domain Definition LanguageThe original specification of PDDL, the standard language for classical AI planning. Understanding PDDL clarifies what formal agent specification looks like at its most precise. Useful reference for the classical planning foundations that LLM agents implicitly replace.
-
A Retrospective on "Intelligence Without Representation"Brooks revisits his 1986 argument in light of 30 years of robotics. Honest assessment of what reactive architectures achieved and where they fell short. Useful for understanding the deliberative-reactive debate in historical context.
-
Specification Gaming: The Flip Side of AI IngenuityA curated list of over 60 specification gaming examples across robotics, games, and language models, with analysis of the common patterns. The best catalog of Goodhart's Law in action — essential for understanding the objective problem.
-
ReAct: Synergizing Reasoning and Acting in Language ModelsIntroduced the ReAct pattern — interleaving chain-of-thought reasoning with action execution — that became the default LLM agent architecture. The paper that bridged classical agent theory and LLM capabilities. The foundational empirical paper of the LLM agent era; detailed treatment in Ch 02.
-
Boyd: The Fighter Pilot Who Changed the Art of WarThe biography of John Boyd and the full OODA loop framework. The decision-cycle ideas are more nuanced and practically useful than the simple four-box diagram suggests. Essential background if you want to understand OODA beyond the surface-level summary found in most AI texts.
-
Building Effective AgentsAnthropic's practical guide to building agents with LLMs, covering when to use agents vs. workflows, how to design tool schemas, and how to handle the safety challenges of autonomous systems. The best practitioner-oriented treatment of LLM agent design from the current era.