Motion Planning & Control, turning intent into motion.

Perception tells the robot where things are. Motion planning and control decide what to do about it: which path to take through the world, what trajectory to follow along that path, and how to drive actuators so the robot actually executes that trajectory under physics that does not always cooperate. Three layers, three time scales, and a set of mathematical tools that have shaped robotics for sixty years and are now being rewritten by learning.

Prerequisites & orientation

This chapter builds on the perception material of Chapter 01 (Robot Perception & Sensing), which gives the world model that planning consumes. It assumes familiarity with linear algebra (Part I Ch 01), basic optimization (Part I Ch 03), and ordinary differential equations (Part I Ch 02). Readers from a controls background will find the sampling-based-planning and learned-control sections the most novel; readers from an ML background will find the trajectory-optimization and MPC sections the most novel.

Three threads run through the chapter. The first is the conventional split into planning (where to go, computed slowly) and control (how to track that path, computed fast). The second is the rise of optimization-based approaches — trajectory optimization and model predictive control — that blur the line between planning and control by re-planning at every step. The third is the gradual encroachment of learned policies on territory that classical planning and control owned for decades.

01

Why Motion Planning Is Hard

"Move the arm to that mug" is, on the face of it, a trivial instruction. The robot has joint angles; the mug has coordinates; surely there is a function that takes one to the other. The trouble is that everything in between makes that mapping non-trivial: the arm has obstacles to avoid, joints with limits, motors with finite torque, and an end-effector whose position depends non-linearly on every joint configuration along the way. A motion that looks easy to a human is a constrained, high-dimensional optimisation problem on the inside.

Three time scales, three layers

The classical way to break the problem down is by time scale. Path planning runs slowly (tenths of a second to seconds) and answers "what spatial route should the robot take?" — a sequence of waypoints from start to goal that avoids static obstacles. Trajectory generation runs at medium rate (tens to hundreds of Hz) and answers "how fast should the robot move along that path, and how should the path be smoothed for the robot's dynamics?" — adding velocity, acceleration, and timing to the path. Control runs fastest (hundreds to thousands of Hz) and answers "what torques or motor commands does the robot need right now to track the trajectory in spite of disturbances?". Each layer assumes the layer above it is correct; each layer hands a more constrained problem to the one below.

PATH PLANNING geometric route ~ 0.1–10 s TRAJECTORY add timing & smoothness 10–100 Hz CONTROL drive actuators 100–1000 Hz goal + map waypoint sequence timed trajectory state feedback (modern stacks: re-plan continuously)
The classical planning-control hierarchy. Path planning produces a geometric route; trajectory generation parameterises it with time, velocity, and smoothness; control drives the actuators to track the trajectory. The clean separation has eroded as MPC and learned policies blur the boundaries — but the time-scale split remains the right way to think about which technique applies to which problem.

The clean separation has been eroding for two decades. Model predictive control (Section 7) does planning and control jointly inside a single optimisation loop. Diffusion policies (Section 10) collapse all three layers into one neural network. But the time-scale logic still applies — fast control loops need predictable per-step latency, slow planning can afford richer search — and most production stacks still respect that division even when the algorithms in each layer are modern.

Why this is harder than it looks

Three properties make motion planning structurally hard. First, the problem is high-dimensional: a manipulator arm has six or seven joints, a humanoid has thirty or more, an autonomous vehicle has a state of position, velocity, heading, and time. Even modest dimensions make exhaustive search hopeless. Second, the constraints are geometric and dynamic at the same time: a path must avoid obstacles in 3D space (geometric) and must respect the robot's velocity, acceleration, and torque limits (dynamic). The two kinds of constraint interact — a kinematically valid path may be dynamically infeasible — and handling them together is more than handling them separately. Third, the environment is rarely static: obstacles move, the goal changes, the robot's own state estimate is noisy. A plan that was optimal a second ago may be infeasible now.

Completeness, optimality, and the rest

Algorithms in this chapter will be characterised by a small set of properties. Completeness means the algorithm will find a solution if one exists, given enough time. Probabilistic completeness is the weaker but often-sufficient guarantee that the probability of finding a solution approaches one as runtime grows. Optimality means the solution found minimises some cost (path length, energy, time); asymptotic optimality is the weaker guarantee that the solution improves toward the optimum as runtime grows. Real planners trade these guarantees for speed: A* is complete and optimal but slow in high dimensions; RRT is probabilistically complete but not optimal; RRT* is asymptotically optimal but expensive. Knowing which guarantee a planner offers tells you when you can trust its output and when you cannot.

Planning vs. Control, in One Sentence

Planning decides where to go; control makes the robot actually go there. Planning runs slowly over a long horizon; control runs fast over a short one. When something goes wrong in robotics, the diagnostic question is almost always "is this a planning failure or a control failure?" — because the fix is different in each case.

02

Configuration Space and the Planning Problem

The most important conceptual move in motion planning is the shift from the workspace (the physical 3D world the robot moves in) to the configuration space, or C-space (the abstract space whose coordinates are the robot's joint angles, base pose, or whatever variables fully specify the robot's state). Once you are in C-space, the robot is a single point and every obstacle becomes a region of forbidden configurations. Planning becomes the classical problem of finding a path between two points that avoids forbidden regions — geometrically simple to state, even if computationally still hard to solve.

The C-space picture

For a 2D mobile robot with translation only, C-space is just the plane: every point in C-space corresponds to a position the robot can occupy. For a circular robot, a workspace obstacle becomes a C-space obstacle simply expanded by the robot's radius — the so-called Minkowski sum. For a 6-DOF manipulator arm, C-space is six-dimensional, and each workspace obstacle carves out a complicated, often non-convex region in that 6D space that corresponds to all joint configurations where the arm collides with that obstacle. The C-space picture remains conceptually clean even when the geometry is impossible to visualise.

WORKSPACE robot r goal C-SPACE (inflated by r) start (point) goal inflate by r
From workspace to C-space. In the workspace (left), the robot has a radius and obstacles have their own shapes. In C-space (right), the robot is reduced to a point and obstacles are inflated by the robot's radius (the Minkowski sum). Once the transformation is done, planning becomes finding a path for a point through expanded obstacles — geometrically simpler, even when the robot is complicated.

Free space, obstacles, and the planning problem

Formally, C-space is partitioned into C-free (configurations where the robot is in valid contact-free state) and C-obs (configurations where the robot is in collision or violates a constraint). The motion planning problem is: given a start configuration qstart and a goal configuration qgoal, find a continuous curve from qstart to qgoal entirely within C-free. Constraints can be added — joint limits, kinematic chains, holonomic constraints — by carving additional regions out of C-free.

The key observation is that explicit construction of C-obs is intractable in high dimensions. Instead, all practical algorithms use a collision checker — a black-box function that, given a candidate configuration, returns true if it lies in C-free and false otherwise. The planning algorithm calls the collision checker many times and uses the structure of the answers to navigate. Collision checking is therefore the bottleneck of most planners, and a great deal of engineering effort goes into making it fast (broad-phase culling, hierarchical bounding volumes, GPU acceleration).

Holonomic vs. nonholonomic, and kinodynamic planning

If the robot can move in any direction in C-space at any moment — a free-flying drone, a manipulator arm — it is holonomic and the planner only has to find a path through C-free. If the robot has constraints on which directions it can instantaneously move — a car cannot move sideways, a unicycle cannot strafe — it is nonholonomic, and the planner must produce a path that respects those motion constraints. Adding velocity, acceleration, and torque limits puts you in kinodynamic planning territory, where the state includes both configuration and its time derivatives, the dimensionality doubles, and constraints are integrated over time rather than evaluated point-wise. Kinodynamic planning is what autonomous vehicles and high-speed drones actually do, and it is significantly harder than the holonomic case.

Why C-space Is the Right Abstraction

If you ever feel lost in a planning paper, ask: what is the C-space here? Once you can name the dimensions of C-space and what makes a configuration valid or invalid, the algorithms make sense. If you cannot name those things clearly, the algorithm description will be incomprehensible no matter how carefully you read it.

04

Sampling-Based Planning

Sampling-based planners abandon any attempt to discretise C-space exhaustively. Instead, they sample configurations randomly, check each for collision, and gradually build a graph (or tree) of feasible motions that connects start to goal. The methods scale to high-dimensional C-spaces — manipulator arms, humanoid robots, multi-robot systems — that graph-based methods cannot touch. They give up explicit completeness in exchange for probabilistic completeness: as the number of samples grows, the probability of finding a solution if one exists approaches one.

Rapidly-exploring Random Trees (RRT)

The dominant sampling-based planner is the Rapidly-exploring Random Tree (RRT), introduced by Steven LaValle in 1998. The algorithm is almost trivial to describe: maintain a tree rooted at the start configuration; at each iteration, sample a random configuration, find the nearest tree node, extend a small step from that node toward the sample, and add the result to the tree if it is collision-free. Repeat until a tree node reaches the goal region.

RRT — basic algorithm
T ← { qstart }
repeat:
  qrand ← uniform-random sample from C
  qnear ← nearest node in T to qrand
  qnew ← step from qnear toward qrand by a fixed distance ε
  if motion(qnear → qnew) is collision-free:
    add qnew to T with parent qnear
  if qnew is in goal region: return path from qstart to qnew
The step size ε is the most consequential tuning parameter: too small and the tree grows slowly; too large and the planner misses narrow passages. The Voronoi bias of nearest-neighbour selection is what gives RRT its rapid exploration property — the tree naturally pushes into unexplored regions of C-space.

RRT is probabilistically complete but not optimal — it returns whatever path it stumbles upon first, which is usually jagged and far from the shortest. It is also single-query: each new planning problem requires building a fresh tree.

RRT* and asymptotic optimality

RRT* (Karaman & Frazzoli, 2011) is the asymptotically optimal variant. The change is small but significant: when adding qnew, examine all nearby tree nodes and (a) connect qnew to whichever neighbour gives the lowest-cost path from the start, and (b) check whether routing each neighbour through qnew would lower its cost, and rewire if so. The result is a tree that progressively improves toward the optimal path as samples accumulate. RRT* costs more per iteration but produces dramatically better paths given enough time.

Probabilistic Roadmaps (PRM)

PRM is the multi-query alternative to RRT. Build a roadmap once: sample many random configurations, connect each to its k nearest neighbours with collision-free edges. The result is a graph that approximates C-free, on which any query (any start, any goal) can be answered cheaply by graph search. PRM is the right choice when the environment is fixed and the robot will solve many planning problems in it (industrial manipulation cells, warehouses), but it is wasteful when the environment changes between queries.

Kinodynamic and informed variants

Several extensions handle problems vanilla RRT cannot. Kinodynamic RRT samples in state space (configuration plus velocity) and uses the robot's dynamic model to extend the tree, producing trajectories that respect velocity and acceleration limits. RRT-Connect grows two trees, one from the start and one from the goal, and tries to connect them — significantly faster than single-tree RRT in practice. Informed RRT* restricts sampling to the ellipsoid that could possibly contain a better path than the current best, which dramatically accelerates convergence after the first solution is found. BIT* and AIT* combine sampling with heuristic search for further speedups. The Open Motion Planning Library (OMPL) ships dozens of these variants; in practice, choosing the right one for your problem can be more impactful than tuning the algorithm parameters.

When Sampling-Based Planning Surprises You

Sampling-based planners are stochastic — run RRT twice on the same problem and you will get two different paths, sometimes very different in length and shape. This is correct behaviour, not a bug, but it means that "find a path" and "find a good path" are different questions. Production stacks usually run the planner many times in parallel and keep the best result, or run an asymptotically optimal variant with a fixed time budget. Either way, expect path length to vary across runs unless you have explicitly added a smoothing or shortening post-processing step.

05

Trajectory Optimization

A path from a planner is a sequence of waypoints in C-space. A trajectory is a time-parameterised function — at every instant t, the trajectory specifies where the robot should be, how fast, and (sometimes) how hard each motor should be pushing. Converting paths to trajectories used to be a separate post-processing step; modern trajectory optimisation does both jobs at once, producing smooth, dynamically feasible motions directly from a high-level objective.

What we want from a trajectory

A trajectory is good if it satisfies several conditions simultaneously: it stays in C-free; it respects the robot's joint limits, velocity limits, acceleration limits, and torque limits; it minimises some objective (path length, energy, time-to-goal, jerk); and it is smooth enough that the controller below can track it without saturating. These are usually competing — a minimum-time trajectory is jerky, a smooth trajectory is slow — so the optimisation problem is constrained, multi-objective, and non-convex.

The optimisation formulation

Trajectory optimisation (continuous form)
minimise   J(q(·), u(·)) = ∫0T ℓ(q(t), u(t)) dt + Φ(q(T))
subject to:
  q̇(t) = f(q(t), u(t)) (dynamics)
  q(t) ∈ Cfree (collision)
  qmin ≤ q(t) ≤ qmax (joint limits)
  umin ≤ u(t) ≤ umax (control limits)
  q(0) = qstart, q(T) = qgoal (boundary)
The running cost ℓ penalises whatever you want to minimise (energy, jerk, deviation from a reference). The terminal cost Φ rewards reaching the goal. The dynamics constraint says the trajectory must obey physics. Numerical methods — direct collocation, multiple shooting, single shooting — discretise this continuous problem into a finite-dimensional non-linear program (NLP) that an off-the-shelf solver (IPOPT, SNOPT, OSQP for QPs) can handle.

The major methods

CHOMP and STOMP. Covariant Hamiltonian Optimisation for Motion Planning (CHOMP, Ratliff et al. 2009) frames trajectory optimisation as gradient descent on a cost functional that combines smoothness (a quadratic on path acceleration) with obstacle avoidance (a distance-field-based collision cost). STOMP (2011) is a stochastic variant that uses noisy rollouts instead of explicit gradients, which works when collision costs are non-differentiable. Both are useful for refining a path from a sampling-based planner into something smooth and dynamically feasible.

TrajOpt. A 2014 framework that directly poses trajectory optimisation as a sequential convex program, using sequential quadratic programming (SQP) with explicit linearisations of collision constraints. Faster than CHOMP and able to find paths in environments where CHOMP gets stuck in local minima, at the cost of more complex implementation.

Direct collocation and multiple shooting. The numerical methods of choice for kinodynamic problems where the dynamics are non-trivial (rocket landing, legged locomotion). Direct collocation discretises both the state and control trajectories at knot points and treats the dynamics as equality constraints; multiple shooting integrates the dynamics on segments and stitches them together with continuity constraints. Both produce large but sparse non-linear programs that solvers like IPOPT handle efficiently.

Spline parameterisations. When the trajectory just needs to be smooth (rather than dynamics-feasible), simple parameterised forms work well. B-splines, cubic splines, and minimum-snap polynomial trajectories — pick a basis, fit it through waypoints with continuity at the joins, and you have a smooth trajectory. Mellinger and Kumar's minimum-snap polynomial parameterisation is the standard for quadrotor flight, where the differentially flat structure of the dynamics makes this approach especially clean.

The local-minima problem

The honest weakness of all gradient-based trajectory optimisers is local minima. A trajectory that starts inside an obstacle cannot smoothly escape if the gradient pushes it deeper; a trajectory that needs to go around a tall obstacle from a different side than its initialisation will not find that solution by local descent. The standard mitigations are good initialisation (often from a sampling-based planner's path), random restarts, and the kind of stochastic exploration STOMP uses. In hard cases, the architectural pattern is to use a sampling-based planner to find a path, then a trajectory optimiser to make it good — combining the global reach of one with the local refinement of the other.

06

PID Control

Once a trajectory exists, something has to drive the actuators so the robot actually follows it. That is the job of control. The simplest controller — and still the most widely deployed in robotics, manufacturing, and industrial automation — is the PID controller. The acronym stands for Proportional-Integral-Derivative, three terms that together produce a control signal proportional to the current error, the accumulated past error, and the rate of change of error. Eight decades into its history, PID remains the default answer when a robot needs to track a setpoint.

The PID equation

PID controller
e(t) = r(t) − y(t) (error: reference − measured)

u(t) = Kp · e(t)  +  Ki · ∫0t e(τ) dτ  +  Kd · de(t)/dt
The proportional term Kp e pushes harder the further off-setpoint you are. The integral term Ki ∫e accumulates past error and eliminates steady-state offset. The derivative term Kd de/dt anticipates: it pushes back when the error is changing fast, damping oscillation. The three gains Kp, Ki, Kd are chosen to balance speed, accuracy, and stability — the central trade-off of PID design.
r(t) + e(t) PID Kp·e + Ki∫e + Kdde/dt u(t) PLANT robot dynamics y(t) SENSOR
The classic PID feedback loop. The reference r(t) is compared to the measured output y(t); their difference e(t) drives the PID controller, which produces a control signal u(t) that the plant (the robot) responds to. The sensor on the feedback path measures the actual output, completing the loop. Every other controller in this chapter is conceptually a more sophisticated replacement for the PID block.

What each term actually does

Proportional (P). The most intuitive term: push harder the further from the setpoint you are. Pure proportional control gets close to the setpoint but typically leaves a small steady-state error — a balance point where the control effort is exactly cancelled by some external load. Too much P-gain causes overshoot and oscillation; too little leaves the system sluggish.

Integral (I). Accumulates past error. Even a tiny steady-state error grows the integral term until it produces enough control effort to eliminate the error entirely. The cost is added phase lag: integral action makes the system slower to respond to changes and prone to overshoot if not balanced by the other terms. Integrator windup — when error accumulates during actuator saturation and produces a large overshoot afterward — is the classical pitfall, mitigated by anti-windup schemes that clamp the integral when the actuator is saturated.

Derivative (D). Reacts to how fast the error is changing. Damps oscillation and improves response to disturbances. The catch: the derivative of a noisy signal is much noisier than the signal itself, so D-gain amplifies measurement noise. Real implementations filter the derivative, take it on the measurement rather than on the error (to avoid derivative kicks when the setpoint changes), or omit it entirely.

Tuning

PID tuning is half art, half engineering. The classical Ziegler-Nichols method increases Kp until the system oscillates with constant amplitude, then sets the gains based on the amplitude and period of those oscillations — a recipe that produces decent baseline performance and works on any plant whose dynamics are loosely first-or-second-order. Modern automated tuning uses system identification to fit a model of the plant and then chooses gains that achieve a specified closed-loop bandwidth and damping. For each individual plant, the right tuning depends on the noise environment, the disturbance spectrum, and the cost of overshoot vs. settling time. Cookbook gains rarely transfer across robots.

What PID can't do

PID is single-input, single-output (SISO) by construction, which is fine for one motor at a time but inadequate for systems where multiple actuators interact. It cannot directly handle hard constraints (joint limits, torque limits, obstacle avoidance) — it can be made to respect them indirectly through gain choices and saturation, but constraints become guarantees only with the optimisation-based methods of the next section. It is reactive, not predictive: PID tracks past error but cannot anticipate future setpoint changes or disturbances. And it is fundamentally a linear, time-invariant controller, which means systems with strongly non-linear dynamics (legged robots, aerial vehicles in extreme manoeuvres) often need more than gain-scheduled PID can provide.

Despite these limitations, PID is everywhere — inside the joint controllers of nearly every commercial robot, in the autopilots of drones, in the speed governors of cruise control. Modern stacks pair high-level optimisation-based controllers (MPC, in the next section) with low-level PID loops on each actuator. The high-level layer chooses the right setpoints; PID makes sure each joint hits them.

PID Is Underrated

It is fashionable in research robotics to dismiss PID as "just a baseline" — but in practice, well-tuned PID outperforms most fancy methods on the tasks it was designed for, runs at any frequency you want, has no learning data requirements, and is the layer of the stack least likely to surprise you. When something more sophisticated fails in production, the answer is often "fall back to PID until we figure out what went wrong." Modern robotics has not retired PID; it has wrapped fancier things on top of it.

07

Model Predictive Control

Model Predictive Control (MPC) is the most consequential idea in modern robot control. The premise is simple: at every control step, formulate and solve an optimisation problem over a short future horizon, apply only the first action of the resulting plan, then re-formulate and re-solve at the next step. The procedure is sometimes called receding-horizon control because the planning window slides forward as time advances. MPC handles constraints, multi-input multi-output systems, and non-linear dynamics in a unified framework — and it is now standard on autonomous vehicles, quadrotor drones, legged robots, and chemical-process control alike.

The MPC formulation

MPC — discrete-time receding horizon
At time k, given state xk, solve:

minimiseu0,...,uN−1   ∑i=0N−1 ℓ(xi, ui) + Φ(xN)
subject to:
  xi+1 = f(xi, ui) (dynamics)
  xi ∈ X, ui ∈ U (state & control limits)
  x0 = xk (initial condition)

Apply u0*, advance one step, re-solve at k+1.
The horizon N is typically 10–50 steps. The state set X encodes obstacle avoidance, joint limits, and any other state constraints; the control set U encodes actuator limits. Solving the optimisation at every control step is the price MPC pays for its capabilities — and the engineering work goes into making that solve fit within the available time budget.

Linear MPC and the QP form

When the dynamics f are linear and the cost is quadratic, MPC reduces to a quadratic program (QP) — a convex optimisation that can be solved very fast. Solvers like OSQP, qpOASES, and HPIPM solve thousands of MPC problems per second on commodity CPUs, which is what makes MPC viable for high-rate control. Linear MPC underpins industrial process control, automotive cruise control, and many commercial robot stacks. When the system is mildly non-linear, linearising around the current state and solving a sequence of QPs (sequential quadratic programming, SQP) extends the approach.

Non-linear MPC

For systems where linearisation is not faithful — quadrotors at high tilt angles, legged robots in flight phase, vehicles near stability limits — the optimisation is genuinely non-linear. Solvers like ACADO, IPOPT, and the differential-dynamic-programming family (DDP, iLQR) handle the non-linear case. iLQR in particular is the workhorse for non-linear MPC in robotics: it linearises around a candidate trajectory, solves an LQR problem on the linearisation, and iterates. The Atlas humanoid's whole-body controller and many quadrotor stacks use iLQR-style MPC.

The real-time problem

The catch with MPC is that you have to solve an optimisation problem inside every control loop — at 100 Hz, you have 10 ms total, of which only a fraction can go to the solver. Several techniques make this feasible: warm-starting (initialising the solver with the previous step's solution), interior-point methods tuned for MPC's banded structure, code generation that produces problem-specific solver code with no dynamic memory allocation (CasADi, acados), and explicit MPC for small problems where the optimal control law can be precomputed offline and just looked up online. Even with all of these, real-time MPC is a constant tension between model fidelity (more accurate dynamics need slower solvers) and rate (faster control loops need simpler problems).

Where MPC shines and where it doesn't

MPC's strengths: it handles state and control constraints natively (no ad hoc saturation logic), it generalises naturally to multi-input multi-output systems, and it can exploit a model of the future (knowing that the road curves ahead, or that a moving obstacle will reach a certain point in two seconds) in a way that PID cannot. The weaknesses are equally direct: it requires a model — and bad models produce confidently wrong control — and it requires the optimisation to converge fast enough at every step. Real systems pair MPC with model identification (online or offline) and with safety fall-backs that revert to simpler control when the optimisation does not return a feasible solution in time.

MPC Eats Planning

The conceptual significance of MPC is that it dissolves the planning-control boundary that motivated the chapter. An MPC controller with a long enough horizon and rich enough constraints is a planner that runs at 100 Hz. The receding-horizon view — re-plan continuously, only ever execute one step — is so general that many production stacks have replaced their plan-then-track architecture with a single MPC layer. The dedicated planning layer survives mostly for problems where the horizon is too long for MPC to handle (city-scale routes, multi-day missions) or where global optimality matters more than reactivity.

08

Whole-Body and Hierarchical Control

The previous sections treated the robot as something tracking a single trajectory in a single space. Real robots — especially manipulators and humanoids — frequently have to do several things at once: keep the end-effector on a target, maintain balance, avoid joint limits, exert a specified force on an object, and stay clear of obstacles. Whole-body control is the family of techniques that lets a robot satisfy multiple objectives simultaneously by exploiting redundancy in its kinematics and dynamics.

Task-space and operational-space control

A 7-DOF arm has more joints than the six numbers needed to specify the end-effector's pose in 3D space. That redundancy is a feature: the same end-effector pose can be reached by an infinite family of joint configurations, and the controller can use that freedom to satisfy secondary objectives. Operational-space control (Khatib, 1987) is the standard formulation. The dynamics of the robot are projected from joint space into the task space (typically 6D Cartesian space at the end-effector), and a control law is designed in that smaller space. The remaining null-space of joint motions — the directions that don't affect the end-effector — is then used for secondary tasks.

Hierarchical / prioritised stacks

When several tasks compete, prioritisation matters. The classical pattern is a strict hierarchy: priority 1 is satisfied first, priority 2 is satisfied within the null-space of priority 1, priority 3 within the null-space of priorities 1 and 2, and so on. For a humanoid robot, the priority order is typically: maintain balance > avoid joint limits > track end-effector pose > minimise effort. The mathematical machinery (hierarchical inverse kinematics, hierarchical least-squares) is straightforward but computationally non-trivial; modern stacks use SOT (Stack-of-Tasks) and similar libraries to manage the hierarchy in real time.

QP-based whole-body control

The dominant modern formulation poses whole-body control as a single quadratic program at each time step: minimise a weighted sum of task-space tracking errors subject to the robot's dynamics, joint limits, contact constraints, and friction cones. The QP is solved at high rate (often 500 Hz to 1 kHz) and produces joint torques that satisfy all hard constraints while compromising softly between competing objectives. This formulation underpins the controllers of nearly every advanced humanoid (Atlas, Digit, Optimus) and most quadrupeds (Spot, Anymal). It is essentially a single-shot MPC: optimise over a one-step horizon, but with a rich constraint set that captures the whole body's physics.

Force and impedance control

Position control is not always what you want. When a robot makes contact with the world — assembling parts, polishing a surface, shaking hands — controlling the contact force matters more than controlling position perfectly. Force control directly regulates the wrench at the end-effector. Impedance control (Hogan, 1985) is the more nuanced option: it specifies a desired relationship between position and force — effectively making the robot behave like a virtual spring-mass-damper system with parameters chosen for the task. A robot under impedance control can respond compliantly to unexpected contact, which is critical for safe interaction with humans and for assembly tasks where small position errors must not produce large forces.

Modern manipulators ship with built-in impedance control modes (Franka Emika Panda, KUKA iiwa, Universal Robots' force-control add-ons). The impedance abstraction has also been generalised to whole-body level — a humanoid leg in stance has a virtual stiffness from foot to hip, tuned for the gait being executed.

09

Reactive and Local Planners

The planners in Sections 3–5 produce a single global path and assume the world stays put. The world rarely cooperates. Pedestrians cross in front of the vehicle, doors open, objects fall, the global plan becomes infeasible. Reactive and local planners run continuously alongside a slower global planner, modifying the immediate trajectory in response to what the robot is sensing right now.

Dynamic Window Approach (DWA)

For wheeled mobile robots, DWA (Fox, Burgard & Thrun, 1997) is the classical local planner. At each step it considers the set of velocity commands the robot can reach within one control period — the "dynamic window" determined by acceleration limits — and scores each candidate by how well it tracks the global path, avoids obstacles, and reaches the goal. The best-scoring command is executed. DWA is fast, easy to tune, and good enough for most indoor and warehouse robots; the navigation stack of ROS uses a DWA-based local planner by default.

Model Predictive Path Integral (MPPI)

MPPI (Williams et al., 2017) is the modern stochastic alternative. It samples thousands of random control trajectories from a Gaussian distribution, simulates each forward through the robot's dynamics, scores each by a cost function, and computes the optimal control as a weighted average — heavier weight on lower-cost trajectories. The whole procedure runs in parallel on a GPU, which is what makes it tractable. MPPI handles non-convex costs, complex dynamics, and rich obstacle structure that gradient-based MPC struggles with, and it has become standard for off-road autonomous vehicles and aggressive drone flight.

Potential fields and their cousins

The oldest reactive technique, potential fields (Khatib, 1986), treats the goal as an attractor and obstacles as repulsors, with the robot moving along the resultant force field. It is simple, fast, and essentially impossible to get right — local minima between obstacles trap the robot, and the field gradient becomes pathological in narrow passages. Potential fields are pedagogically important and still occasionally used as a fallback, but no serious system relies on them alone. Variants like navigation functions (Rimon & Koditschek) provide the mathematical machinery for fields without local minima at the cost of much harder construction.

Putting it together: the planning-control stack

Production mobile-robot navigation looks like this in nearly every deployment: a global planner (A* on a costmap, every 1–10 seconds) produces a long-horizon path; a local planner (DWA or MPPI, at 5–20 Hz) produces a short-horizon trajectory that respects dynamics and reacts to sensed obstacles; a low-level controller (PID on each wheel, at 100+ Hz) tracks the local plan. Each layer abstracts the layer below it: the global planner sees a static map, the local planner sees the current obstacles, the controller sees just velocity commands. The architecture is robust precisely because each layer can fail without taking down the whole stack — the global plan can be temporarily infeasible, the local plan can stall, and the wheel controller will still hold the robot stationary safely.

10

Frontier: Learned Control

The classical stack of this chapter — graph search, sampling-based planning, trajectory optimisation, MPC, PID — has dominated robotics for decades and continues to dominate where reliability matters most. But a parallel line of work has been steadily eating territory that classical planning and control once owned, by training neural-network policies that map directly from observations to actions. The transition is not yet complete, and may never be on safety-critical systems, but no honest survey of motion planning and control in 2026 can ignore the learned-control frontier.

Diffusion policies

Diffusion policies (Chi et al., 2023) treat action prediction as a generative-modelling problem: given an observation, sample an action sequence from a diffusion model trained on demonstrations. The approach handles multi-modal action distributions naturally — when a task can be solved several ways, the model represents all of them and samples one — and produces smooth action trajectories without the discrete-action artefacts of earlier behaviour-cloning approaches. Diffusion policies have become the dominant architecture for learned robot manipulation in 2024–2026, and they sidestep the planning-control split entirely: the policy generates a short trajectory, the robot executes it, and the policy generates the next.

Action chunking and the policy as a planner

A surprising empirical finding from the past few years is that policies that predict chunks of future actions (a sequence of, say, 16 future actions at once) outperform policies that predict one action at a time. The action-chunking transformer architecture (ACT, Zhao et al. 2023, used in ALOHA) anchored this trend; nearly every modern manipulation policy now predicts action sequences. This gives the policy something like a planning horizon — it commits to a short plan, executes it, and re-plans — which mirrors the receding-horizon idea of MPC, but with the optimisation replaced by a learned mapping.

Differentiable physics and learned dynamics

The other frontier is the dynamics model that MPC depends on. Hand-derived dynamics models are fine for rigid-body systems but break down for soft contacts, deformables, granular media, fluids — the regime where most useful robotics work happens. Differentiable simulators (Brax, MuJoCo MJX, Drake) and learned dynamics models (Dreamer, world models, neural physics) provide differentiable approximations of the real dynamics that can be plugged into MPC or used to train policies via gradient-based methods. The result is a hybrid architecture: classical optimisation-based control with a learned model of the world's response.

What this chapter does not cover

The frontier topics here connect to material in other chapters. Behavioural cloning, DAgger, and teleoperation datasets — the data that makes diffusion policies and ACT possible — are the subject of Chapter 03 (Learning from Demonstration & Imitation). The simulation environments that provide the dynamics models and synthetic training data belong to Chapter 04 (Sim-to-Real Transfer). The vision-language-action models that fuse perception, language, and control into a single foundation-scale policy belong to Chapter 05. And the integration of all of these into a complete autonomous-vehicle stack — where the safety, regulatory, and behavioural constraints are at their most demanding — belongs to Chapter 06.

Motion planning and control is the layer where intent meets physics. The classical methods of this chapter remain the load-bearing structure of essentially every working robot in 2026; the learned methods are reshaping that structure in real time. A practitioner who knows both — who can choose between an MPC controller and a diffusion policy based on what the task actually requires — will be doing the most interesting work in robotics for the foreseeable future.

Further Reading