Part XIV · Applied Domains · Chapter 10

Human-AI Interaction & UX, designing for collaboration with intelligent systems.

Every AI system is, ultimately, a system used by humans. The methodology of the model is one half of the deployment; the design of the interaction is the other. Human-AI Interaction (HAI) is the discipline that studies and designs that interaction — how users perceive AI outputs, how they delegate or oversee tasks, how trust is calibrated, how feedback flows back into the model, and how interfaces shape the cognition of both human and machine. The 2022–2026 wave of generative AI rewrote the interface assumptions of an entire field: chat became the dominant input modality almost overnight; copilots displaced traditional menus across major productivity software; AI agents began acting on behalf of users with limited supervision. Each of these shifts strains the conventional vocabulary of HCI (human-computer interaction). This chapter develops the major patterns of AI-mediated interface design — autocomplete and copilots, conversational agents, interactive ML — and the deeper UX disciplines that underlie them: cognitive load, trust calibration, explainability, feedback collection, and accessibility. The aim is to give practitioners the conceptual machinery for designing AI systems that work with users rather than around them.

Prerequisites & orientation

This chapter sits at the intersection of ML methodology, classical HCI, and applied UX research. No formal HCI background is assumed; the chapter introduces relevant concepts (Fitts's law, working memory, automation bias, dark patterns) as they arise. The material draws on alignment and feedback methodology (Part XV/XVI on RLHF and AI safety), the recommender-systems and search material from earlier in this part, and the agent-systems material of Part XI for the agentic-interface discussion of Section 8.

Two threads run through the chapter. The first is the asymmetry of the human-AI partnership: AI systems are fast but brittle, confident but sometimes wrong, fluent but sometimes hallucinatory, while humans are slow but contextually wise, cautious but capable of expert judgement, and the interface mediates how each partner's strengths reach the joint task. The methodology of the field is the design of that mediation. The second thread is the feedback loop: AI systems learn from user behaviour, which means interface design choices about what to log, how to prompt feedback, and what to optimise for shape the trajectory of the underlying model. The deployment is not separable from the model; the methodology has to handle both layers simultaneously.

In this chapter

Why Human-AI Interaction Is Distinctive probabilistic outputs · explainability · automation paradox
HCI Foundations and the Pre-AI Era CLI · GUI · direct manipulation · Fitts · Norman · Nielsen
Interface Patterns for AI Systems autocomplete · copilots · agents · chat · structured forms
Cognitive Load and Attention Sweller · working memory · automation surprise · ironies
Trust, Calibration, and Reliance over-trust · under-trust · automation bias · disuse · misuse
Explainability and Transparency in UX model cards · confidence display · attribution · disclosure
Feedback Collection and the Data Flywheel thumbs · RLHF · implicit signals · dark patterns to avoid
Conversational Interfaces and Agent UX multi-turn · error recovery · mode awareness · agentic UI
Accessibility and Inclusive AI Design assistive tech · language equity · localization · neurodiversity
Evaluation, Ethics, and the Frontier UX research · manipulation risk · companionship · the future

Why Human-AI Interaction Is Distinctive

Human-computer interaction has been a discipline since the 1960s. The arrival of probabilistic, generative, and increasingly agentic AI systems has not so much overturned its principles as exposed assumptions that no longer hold: that systems behave deterministically, that user intent maps cleanly to commands, that errors are predictable, and that the interface is the surface rather than the system. AI-mediated interaction violates each of these in different ways, and the methodology of HAI is the response.

Probabilistic outputs

Classical software is deterministic: the same input yields the same output. AI systems are probabilistic — the same prompt may yield different completions, the same image may be classified differently across model versions, the same recommendation engine may surface different items each session. Designers used to deterministic systems often build interfaces around the false expectation of stability, then discover users complain about inconsistency. The methodology of HAI accommodates probabilistic behaviour explicitly: surfacing alternative options, offering regeneration, communicating uncertainty, and resisting the temptation to hide the stochasticity behind a deterministic-looking surface.

The explainability gap

Classical software is, in principle, fully inspectable — its behaviour traces back to source code, every decision reduced to a control-flow path. Modern ML systems are opaque: a 70-billion-parameter language model produces an answer through a chain of attention computations no human can follow, and the chain is rarely the actual reasoning that justifies the output. The gap between what users want explained and what the system can plausibly explain is real, persistent, and shapes the methodology of Section 6's transparency design. The honest pattern is that AI explanations are post-hoc rationalisations — useful as user-facing context, but not the actual computation — and interface design that treats them as ground truth produces the wrong kinds of trust.

The automation paradox

Lisanne Bainbridge's 1983 paper "Ironies of Automation" identified the paradox that has shaped human-factors thinking for forty years and still describes AI deployment in 2026. The more reliable an automated system, the less the human operator needs to attend to it; the less they attend, the worse they perform when the automation eventually fails — at exactly the moments when human intervention is most needed. Modern AI replays this paradox at scale: the more capable a copilot, the less practice the user gets at the underlying task; the better a search ranker, the less the user develops query-formulation skill. The methodology of HAI addresses this by designing for sustained human engagement rather than maximum delegation, with "keep the human in the loop" not as a slogan but as a measurable design constraint.

Feedback loops and the recursive interface

Unlike classical software, AI systems learn from interaction. Every click, dwell, thumbs-up, and ignored suggestion becomes training data. This means interface design choices are, indirectly, model-design choices. A recommender that prompts users for explicit ratings will collect different data than one that infers preferences from implicit signals; a copilot that asks for confirmation before each action will generate different feedback than one that acts autonomously. The methodology of Section 7 develops the feedback layer in detail; the conceptual point is that AI interfaces are recursive — they shape the system that they then have to interface with — and this is unique to ML-driven products.

Speed mismatch

AI systems often operate at superhuman speed (a language model produces hundreds of tokens per second; a vision model classifies images in milliseconds), but human cognition is bounded by perception, working memory, and decision time on the order of seconds. The mismatch creates a design problem: how to pace interaction so that humans can meaningfully engage with the AI's output rather than being overwhelmed or rubber-stamping. Conversely, some AI tasks are slow (multi-step reasoning, tool use, long generations) and the design problem inverts: how to maintain user attention through latency. Both ends of the speed spectrum require dedicated UX patterns.

The Defining Tension of HAI

AI systems are powerful, fast, and probabilistically wrong. Humans are slow, contextually wise, and capable of catching errors the model can't see. The methodology of human-AI interaction is the design of partnership patterns that route work to the right partner at the right time — neither over-trusting the model nor under-using it, neither overwhelming the user nor letting them disengage.

HCI Foundations and the Pre-AI Era

Human-AI interaction inherits substantial conceptual machinery from classical human-computer interaction — a discipline with sixty years of empirical research, formal models, and design principles. Modern AI interfaces extend rather than replace this foundation, and understanding what HCI established before AI is essential for understanding what AI changes.

The eras of HCI

HCI has gone through several broad eras. The command-line interface era (1960s–early 1980s) demanded users learn precise commands; the cognitive cost was high but the model of computation was transparent. The graphical user interface era (Xerox Star 1981, Apple Lisa 1983, Macintosh 1984, Windows from 1985) introduced direct manipulation — drag-and-drop, pointing, menus — and shifted the methodology toward visual affordances and discoverability. The web era (mid-1990s onward) added hyperlinks, page-based navigation, and the design vocabulary of information architecture. The mobile era (iPhone 2007 onward) introduced touch, gestures, location-awareness, and notification-driven attention. The AI era (roughly ChatGPT-onward, late 2022) is the current chapter, and is best understood as an extension rather than a replacement of the patterns that came before.

Foundational principles

Several formal models underlie HCI. Fitts's law (1954) quantifies the time to point at a target as a function of distance and target size — the basis of why important buttons are big and close. Hick's law describes decision time as logarithmic in the number of choices — the basis of why menus should be flat and concise. The Gestalt principles of perception (proximity, similarity, closure, continuity) underlie visual layout. Norman's action cycle (goal → plan → execute → perceive → interpret → evaluate) describes how users approach interactive tasks; the design problem is reducing the gulf between user intent and system response at each step.

Heuristics and usability

The most-cited body of practical HCI is Jakob Nielsen's 10 usability heuristics (1994, refined since): visibility of system status, match between system and the real world, user control and freedom, consistency, error prevention, recognition over recall, flexibility, aesthetic minimalism, error recovery, and help-and-documentation. These remain the working vocabulary of design reviews. Don Norman's "The Design of Everyday Things" (1988) extended this to physical objects with the principles of affordances (what an object suggests can be done with it) and signifiers (cues that communicate the affordances). The methodology of modern AI interface design still rests on these foundations.

Mental models

A central HCI concept is the mental model: the user's internal theory of how the system works. Effective interfaces help users build accurate mental models; misleading interfaces produce confused users who blame themselves for the system's design failures. AI systems pose a particular mental-model challenge: the actual system is opaque, the surface is fluent, and users routinely build mental models that overestimate AI capabilities (the chatbot "understands" them) or underestimate them (the model is "just autocomplete"). The methodology of Section 5's trust calibration is, in part, the design of accurate mental models.

What HCI didn't anticipate

Classical HCI was developed for systems with stable behaviour, deterministic responses, and well-defined task structures. Several AI properties strain this foundation. Probabilistic behaviour breaks the consistency heuristic. Generative outputs break the recognition-over-recall pattern (users now choose among AI-produced alternatives, not among predefined options). Conversational interfaces remove the affordance-based discoverability that GUIs depend on (a chat box gives no signifier of what the system can do). Agent-driven interfaces invert the locus of action from user to system. Each of these requires extensions of HCI methodology, and the chapter develops the extensions in turn.

Interface Patterns for AI Systems

Several distinct interface patterns have emerged for AI-mediated software, each with characteristic strengths and failure modes. Recognising the patterns and the design discipline each requires is the working vocabulary of modern HAI.

Autocomplete and inline suggestion

The longest-running AI-interface pattern is autocomplete: as the user types, the system proposes continuations the user can accept with a keystroke. Google search suggestions (from 2008), Gmail's Smart Compose (2018), and the GitHub Copilot family (2021 onward, descended from OpenAI's Codex) are the canonical examples. The pattern is conservative — the user is always in the driver's seat, the AI surfaces options, the user accepts or rejects. It scales gracefully from simple word completion to multi-line code suggestions, and it is among the most-empirically-validated AI interfaces in production.

Copilots

The copilot pattern, established at scale by GitHub Copilot and generalised by Microsoft 365 Copilot, Anthropic's Claude in document and spreadsheet products, and the various coding-copilot competitors, sits a level above autocomplete. The user works in their normal application; the copilot offers contextual assistance — generate a draft, summarise a section, suggest a refactor, explain a region. Crucially, the user remains in control of the document; the copilot acts on selections, regions, or explicit invocation rather than on the whole at once. Empirical adoption metrics across Microsoft, GitHub, and Google have been substantial: the 2024 GitHub research reported ~55% productivity improvement on coding tasks for users who adopted Copilot, with subsequent studies showing more nuanced patterns.

Chat as universal interface

The chat interface — text input, text or multi-modal output, multi-turn — became the dominant AI-interaction pattern almost overnight after ChatGPT's November 2022 launch. By 2026, chat is the default surface for general-purpose AI assistance across consumer, productivity, and increasingly developer applications. The pattern's strengths are immense flexibility (any task expressible in natural language) and conceptual simplicity. Its weaknesses are equally important: lack of discoverability (the user doesn't know what the system can do until they try), poor structure for multi-step or complex tasks, vulnerability to prompt injection and confused intent, and a tendency to produce verbose responses to simple queries.

Structured forms and constrained generation

For tasks where the output structure is known, structured-form interfaces outperform free-form chat. A travel-booking AI that asks "where, when, how many travellers, budget?" through a form and then generates options is more usable than one that requires the user to specify all parameters in a chat prompt. The 2023–2026 wave of "AI-native applications" has substantially re-discovered this — the most successful AI products often combine a structured input layer with a generative output layer rather than putting the entire interaction in chat.

Agentic interfaces

The agentic interface pattern is the frontier as of 2026: AI systems that act on the user's behalf with limited supervision, executing multi-step tasks across applications and the web. Cowork (the platform this compendium runs on), Anthropic's Claude in Chrome, the various OpenAI Operator-style products, and the agent-development frameworks of Part XI represent the early generation. The interface design problem is substantially harder than for chat or copilots: the user's locus of attention shifts from per-token review to high-level oversight, the failure modes are more severe (an agent that takes the wrong action has consequences in the real world), and the interaction patterns for delegation, monitoring, and intervention are still being worked out. Section 8 develops the agentic-interface frontier in detail.

Hybrid and embedded interfaces

Most successful AI products in 2026 use combinations of these patterns rather than a single one. A coding assistant might combine inline autocomplete (continuous), a chat sidebar (interactive Q&A), and an agentic mode (multi-step changes); a productivity copilot combines structured invocation (explicit menu actions) with generative output (drafts, summaries) and chat-mediated revision. The methodology of design is choosing the right pattern for each user task rather than committing to one universal interface, and the 2024–2026 design literature has substantially formalised this multi-pattern approach.

The spectrum of AI interface patterns, ordered by how much of the action the AI takes. Each pattern carries its own UX discipline; the rest of the chapter develops what each requires.

Cognitive Load and Attention

Human cognition is bounded. Working memory holds roughly seven items for tens of seconds; sustained attention degrades after twenty minutes; switching between tasks incurs measurable cost. AI interfaces routinely strain these limits — a fluent chatbot can produce more content per minute than the user can usefully process — and the methodology of Section 4 is the discipline of designing within human cognitive bandwidth rather than against it.

Cognitive load theory

John Sweller's cognitive load theory (developed from the 1980s onward in the educational-psychology context) decomposes mental effort into three categories. Intrinsic load is the inherent complexity of the task. Extraneous load is the additional effort imposed by the interface or presentation — the part designers can reduce. Germane load is the effortful processing that builds learning and skill. The design objective for AI interfaces is to minimise extraneous load (interface friction, unnecessary choice, verbose output) while preserving appropriate germane load (engagement that builds the user's understanding).

Working memory and the magic number

George Miller's 1956 paper "The Magical Number Seven, Plus or Minus Two" established that human working memory holds approximately seven discrete items at once. The number is approximate and depends on the type of items (digits versus chunks of meaning), but the constraint is real and underlies many UX patterns. AI interfaces that produce ten alternative drafts at once exceed users' capacity to evaluate them; chat outputs of two thousand words exceed users' capacity to retain them. The methodology of progressive disclosure — presenting information in layers, with summary first and detail on demand — addresses this directly and is among the most-effective AI-interface patterns.

The ironies of automation, revisited

Section 1 introduced Bainbridge's automation paradox; Section 4 develops its cognitive consequences. Highly-reliable automation produces vigilance decrement: human operators monitoring rare events (the model occasionally producing wrong output) lose attention over time and miss the moments when intervention matters. Air-traffic control, nuclear plant operation, and increasingly driving (Tesla Autopilot, the various ADAS systems) all show this pattern empirically. AI deployment replays it: code reviewers approving copilot-generated changes without inspection, doctors accepting AI diagnostic suggestions without independent assessment, content moderators rubber-stamping AI-flagged decisions. The methodology of Section 5's trust calibration is, in part, the design of interfaces that resist vigilance decrement.

Attention and notification design

AI systems can produce notifications, prompts, and suggestions continuously. Without careful design, the result is attention residue (lingering attentional cost from interrupted tasks) and interruption fatigue (users tuning out the AI entirely). Gloria Mark's research on workplace interruptions, the various 2010s-era smartphone-notification studies, and the 2024 wave of "AI fatigue" survey research all converge on the same conclusion: more notifications is not better. The methodology of effective AI interfaces invests substantially in notification rationalisation — fewer prompts, more context-aware timing, explicit user control over when AI inserts itself.

Flow and the deep-work problem

Csikszentmihalyi's flow describes deep, absorbed engagement with a challenging task — generally regarded as the highest-quality state for creative work. AI interfaces present a flow problem: continuous suggestion or chat-based interaction tends to fragment attention, while flow requires uninterrupted focus. Cal Newport's "Deep Work" thesis and the various 2023–2026 critiques of AI-mediated knowledge work raise this directly: are we trading short-term productivity gains for long-term flow degradation? The empirical evidence is mixed and rapidly evolving; the design implication is that successful AI interfaces increasingly include "do not disturb" modes, opt-in suggestion, and explicit flow-protection patterns.

Trust, Calibration, and Reliance

Trust in AI systems is the single most-studied dimension of human-AI interaction. The decades-long human-factors literature on trust in automation predates the modern AI wave by half a century, and the central concepts — over-trust, under-trust, trust calibration, automation bias — remain the working vocabulary of the field.

The trust-calibration problem

The classical formulation comes from Lee & See's 2004 paper "Trust in Automation": the right level of user reliance on an automated system equals the system's actual reliability. Both directions of mismatch produce harm. Over-trust (user reliance exceeds system reliability) leads to automation bias: the user delegates tasks the system cannot handle correctly, and accepts wrong outputs. Under-trust (user reliance falls short of system reliability) leads to disuse: the user ignores or overrides correct AI suggestions, foregoing the benefit. The methodology of trust calibration is the interface design that conveys actual reliability so the user develops appropriate reliance.

Automation bias

Numerous empirical studies document automation bias — the tendency for users to favour automated suggestions over their own judgement, even when the suggestions are wrong. The 2024 wave of LLM-based decision-support studies extends earlier human-factors work: physicians presented with AI diagnostic suggestions adjust their own diagnoses toward the AI even when shown that the AI is mistaken; analysts using AI-generated summaries miss errors at higher rates than those reading the underlying documents. The bias is robust, present even with extensive training, and shapes deployment patterns: AI suggestions in safety-critical settings need explicit countermeasures (forced engagement with raw evidence, dissent-elicitation prompts, calibrated confidence display).

The calibration of confidence

Effective trust calibration requires the AI system itself to communicate calibrated confidence — predictions that are accurate, with explicit uncertainty estimates the user can act on. The methodology connects to Part XIII's Bayesian deep-learning material and to the broader literature on calibration metrics (expected calibration error, reliability diagrams). Production AI systems in 2026 are still substantially miscalibrated: language models routinely express high confidence in wrong outputs, recommender systems rarely surface uncertainty, copilots present generated code without flagging its risk profile. The design problem is partly methodological (improving model calibration) and partly UX-level (presenting calibrated confidence in ways users can interpret).

Earned trust over assumed trust

Trust in AI is not given; it is built through experience. The methodology of effective AI deployment recognises this with progressive-autonomy patterns: the user starts with high oversight, the AI demonstrates reliability over time, and oversight gradually relaxes. Tesla's Autopilot rollout (with substantial regulatory and PR consequences when the trust assumption ran ahead of capability), Anthropic's Claude product roll-outs (with explicit "research preview" framings), and the Cowork agentic-mode deployment all illustrate progressive-autonomy as a deployment pattern. The opposite — full autonomy from the first interaction — produces the trust failures that headline news stories.

Recovering from trust failures

When AI systems fail visibly — wrong answers, hallucinated citations, harmful actions — recovering user trust is harder than building it initially. The empirical pattern (well-documented in product analytics from major AI providers) is that users who experience a single high-confidence AI failure often abandon the system entirely, even when the overall reliability is high. The methodology of failure communication — acknowledging errors clearly, explaining their nature, demonstrating systemic improvement — is critical for sustained product adoption, and the field's track record on this is mixed.

Explainability and Transparency in UX

If trust is what users feel toward AI systems, transparency is the design machinery that gives them grounded reasons for that feeling. The methodology connects to the explainability material of Part XVI on AI safety; this section develops the user-facing layer.

What users actually want explained

Empirical UX research on AI explanations shows that what users want is different from what ML researchers usually produce. Users want: why this answer rather than another, can I trust this for my use case, what would change the answer, and where does this come from. Saliency maps, attention visualisations, and gradient-based explanations — the staple outputs of academic explainability research — are largely useless to non-expert users. The methodology of effective explanation is the layer between technical interpretability and user mental models, and it is mostly UX work rather than ML work.

Source attribution and citation

The most-effective transparency pattern in 2026 is source attribution — pointing the user at the documents, web pages, or training corpus regions that the AI used to construct its answer. The Perplexity, You.com, and Bing Chat patterns established source citation as a chat-interface norm, and Anthropic's Claude added document-grounded citation across its products by 2025. The pattern works because it shifts the user's epistemic check from the AI's confidence to verifiable external evidence, which users are far better at evaluating. Hallucination — fabricated content presented confidently — is the principal failure mode that source attribution addresses.

Confidence display

Beyond pointing at sources, AI interfaces increasingly surface explicit confidence indicators: probability scores on classifications, hedged-language qualifiers in generation ("I'm fairly confident…"), explicit uncertainty regions in extracted facts. The methodology connects to calibration: confidence displays only help if they are accurate. A miscalibrated 90%-confidence display (used for outputs that are right 60% of the time) is worse than no display at all, since it actively misleads. Production deployments invest substantially in calibration before exposing confidence indicators.

Model cards and disclosure

At the product level, transparency includes model cards (Mitchell et al. 2019) — structured documentation of an AI model's intended use, training data, limitations, and evaluated performance. Hugging Face popularised the format; major AI providers (Anthropic, OpenAI, Google) maintain model cards for their production models. Model cards aren't user-facing in the moment-to-moment interaction but serve as the durable record of system properties for journalists, regulators, customers, and downstream developers. The 2024 EU AI Act's transparency obligations make model-card-style disclosure a legal requirement for high-risk AI systems in Europe.

The explanation paradox

A careful methodological caveat: providing explanations does not always help users. Multiple studies (Kaur et al. 2020 on explainable-AI adoption, the 2023 WEIRD-AI studies on explanation effects) document that bad explanations actively mislead users, that explanations can produce false confidence in wrong outputs, and that users often value the appearance of transparency over the substance. The methodology of effective explanation requires resisting the temptation to add explanations to look transparent; the better pattern is calibrated confidence plus source attribution, with explanation reserved for cases where it is grounded in actual model behaviour.

Feedback Collection and the Data Flywheel

AI systems learn from interaction, and the interface is what determines what they learn. The methodology of feedback collection — what to log, how to elicit explicit signals, how to weigh implicit ones — is the bridge between UX and the underlying ML pipeline.

Explicit feedback: thumbs and ratings

The simplest feedback pattern is the thumbs-up/thumbs-down rating on AI outputs. ChatGPT, Claude, Bing Chat, GitHub Copilot, and most production AI products include some variant. The data is easy to collect, easy to interpret, and feeds directly into preference-learning pipelines (RLHF, DPO, the various preference-optimisation methods of Part XV). The downsides are equally important: response rates are low (single-digit percentages of users provide explicit feedback), the signal is binary and lossy, and selection bias is severe (frustrated users are more likely to thumbs-down than satisfied users are to thumbs-up).

Implicit feedback signals

Far more data comes from implicit signals: did the user accept the autocomplete or reject it, did they edit the AI-generated draft heavily or lightly, did they regenerate the response, did they continue the conversation, did they copy the output, did they return tomorrow. The methodology of implicit-feedback inference is delicate: each signal carries multiple possible interpretations, and naive aggregation can misalign the model with user intent (a user who accepts a copilot suggestion may not have noticed it was wrong; a user who regenerates may have wanted variety rather than improvement). Production preference pipelines combine multiple implicit signals with explicit ones, with substantial care for the inference layer.

The RLHF data flywheel

Reinforcement learning from human feedback (RLHF) — and its successor direct preference optimization (DPO) — uses human comparisons of AI outputs to fine-tune model behaviour. The methodology depends on a steady stream of preference comparisons, which most major AI providers collect at scale through both explicit ratings and dedicated annotation programs. The "data flywheel" pattern — more users produce more feedback, which improves the model, which attracts more users — has been a central strategic asset of the major AI providers from 2022 onward.

Dark patterns to avoid

Feedback collection has a long history of dark patterns — design choices that elicit data through pressure rather than genuine user signal. Modal dialogs that block interaction until rated, "would you like to help us improve?" prompts that imply social obligation, asymmetric reward designs (badges for positive ratings, no equivalent for negative), and engagement-maximising algorithms that learn to produce content users will respond to rather than content that helps them. The 2024 wave of regulatory attention to dark patterns (the FTC's various enforcement actions, the EU's Digital Services Act provisions) increasingly applies to AI products, and the methodology of ethical feedback collection is part of mainstream UX practice.

Privacy and consent

Feedback collection is also data collection, with corresponding privacy obligations. GDPR's "data minimisation" principle, the various US state privacy laws (CCPA in California, CDPA in Virginia, the rapidly multiplying state-level frameworks), and increasingly the AI Act's data-governance provisions all require explicit user consent for using AI interaction data to train models. Production AI systems have moved substantially toward opt-in defaults, federated-learning approaches (Part XIII Ch 10) that keep data on-device, and explicit "use my data for training" toggles. The methodology of compliant feedback collection is now a substantial sub-discipline within AI product development.

Conversational Interfaces and Agent UX

Section 3 introduced the chat pattern; this section develops its sophisticated cousins. Multi-turn conversational interfaces and increasingly agentic interfaces — where AI takes action on the user's behalf — are the active frontier of HAI design as of 2026, with the methodology still being worked out.

Multi-turn dialogue

The key UX challenge in multi-turn dialogue is state — the system needs to remember and act on prior context, and the user needs to know what the system remembers. Early chatbots (Eliza, the various 1990s rule-based systems, Apple Siri's early generations) had little persistent state and struggled with conversational coherence. Modern LLM-based chat handles long contexts well but introduces new failure modes: forgetting recent turns when context is full, conflating instructions across turns, and producing inconsistent persona or stance over long conversations. The methodology of dialogue UX includes explicit context-display (let the user see what the system knows), context-management controls (clear, edit, summarise), and persistent memory features that survive across sessions (the ChatGPT Memory feature, the Claude memory, the various persistent-memory product launches of 2024–2026).

Error recovery in conversation

When the AI misunderstands, gets the answer wrong, or proceeds down the wrong path, the interface needs efficient error recovery. The chat pattern's natural recovery mechanism is the next user message ("no, I meant…"), but this scales poorly when the user has invested attention in a wrong direction. The methodology includes: regeneration (try again with the same input), edit-and-resubmit (modify the previous turn rather than appending), branching (explore alternatives without losing the original), and system-acknowledgement patterns where the AI explicitly checks understanding before proceeding on consequential actions. The 2024–2026 product generation has substantially advanced these patterns; the canonical "regenerate" button in ChatGPT and Claude is a small example of substantial UX work behind it.

Mode awareness and the locus of action

As AI systems take more autonomous action, users need to know what mode the system is in — passively offering suggestions, actively executing tasks, or somewhere between. Aircraft cockpit design has wrestled with this since the introduction of fly-by-wire systems, and "automation surprise" — the operator not realising the system was in a different mode than expected — remains a contributing factor in incident reports. AI interfaces face the analogous problem: a user who thinks they are chatting with an information assistant may not realise the same system has been delegated email-sending authority. The methodology of mode-aware interface design (explicit visual indicators, action-confirmation prompts on consequential operations, clear undo affordances) is becoming standard for agentic AI products in 2026.

The agentic frontier

The most-active design frontier of 2026 is the agentic interface: AI systems that decompose user goals into multi-step plans, execute tools, observe results, and iterate. Cowork's agent mode (the platform this compendium runs on), Anthropic's Claude in Chrome, OpenAI's Operator family, the various AutoGPT-descendant frameworks, and the agent-development tooling of Part XI all sit in this space. The interface design problems are substantial: how to convey the agent's plan, how to surface intermediate steps without overwhelming the user, how to ask permission for consequential actions without breaking the agent's coherence, how to provide an "abort" affordance, how to bound the agent's autonomy in user-meaningful ways. The methodology is rapidly evolving, and best practices in 2026 will be different from best practices in 2028.

Transparency in agent behaviour

A specific design problem worth flagging: agentic AI raises the explainability bar substantially. A chatbot that gave a wrong answer can be corrected in the next turn; an agent that took a wrong action may have already sent the email, transferred the funds, or modified the file. The methodology of action transparency — showing the agent's plan before execution, surfacing each tool call as it happens, providing detailed audit logs — is essential for agentic deployment, and is a substantial part of why current-generation agent products operate within sandboxes (browser-only, file-folder-scoped, dedicated workspaces) rather than across the user's entire digital life.

Accessibility and Inclusive AI Design

AI interfaces have substantial accessibility implications, both as enablers (assistive technology that genuinely helps) and as barriers (interfaces that exclude users with disabilities, language differences, or atypical interaction patterns). The methodology of inclusive AI design is a substantial sub-discipline that this section surveys.

AI as assistive technology

AI has been one of the most-impactful enablers of accessibility in software's history. Screen readers have used speech synthesis for decades, but the 2020s wave of high-quality TTS (Tortoise, ElevenLabs, the various open-source successors) has substantially improved the experience. Live captioning using ASR (automatic speech recognition) has become near-real-time and high-accuracy across major communication platforms, transforming accessibility for deaf and hard-of-hearing users. Image description via vision-language models (Microsoft Seeing AI, the Be My Eyes / OpenAI partnership) lets blind users get descriptions of arbitrary photos and surroundings. Switch-control and gaze-based interfaces increasingly use ML to interpret limited input modalities, expanding access for users with motor impairments. The methodology of these systems is mainline AI; the impact on access is enormous.

Accessibility failures in modern AI interfaces

The same AI wave has also produced substantial accessibility regressions. Chat-only interfaces are often poorly compatible with screen readers. Auto-generated UI elements may not have proper ARIA labels. Dynamic content updates can confuse assistive technology that expects stable DOM. Voice interfaces exclude users with speech disorders or non-standard accents. The methodology of inclusive AI design requires explicit accessibility testing — automated tools (axe, WAVE, Lighthouse), user studies with disabled users, and the WCAG 2.2 standards as the working baseline — and many AI products in 2026 fall short.

Language and cultural equity

Most production AI is heavily English-centric. The training data is overwhelmingly English (LLaMA's training was 90%+ English even after multilingual efforts), the evaluation benchmarks are English-language, and the safety tuning is done with English-speaking annotators. The result is substantial quality gaps: Spanish, Hindi, Arabic, and other major languages get worse output; small-population languages get little support at all. The 2023–2026 wave of multilingual models (the various open-weight multilingual variants, the BLOOM project's 46-language coverage, the Mistral and Falcon multilingual offerings) addresses this partially, but the gap remains substantial. The methodology of multilingual AI is also a methodology of cultural localisation: norms, examples, and assumptions baked into English-language training don't transfer cleanly across cultures, and inclusive design treats language as inseparable from culture.

Neurodiversity and atypical interaction patterns

Mainstream AI products optimise for median users, which leaves neurodivergent users (autistic, ADHD, dyslexic, the various other patterns) frequently underserved. The methodology of neurodivergent-inclusive design includes: customisable verbosity (some users want concise outputs, others want detail), adjustable interaction pace (some users want rapid response, others need more time to read and respond), support for non-linear conversation patterns (some users approach problems through tangential exploration), and resistance to over-prescriptive interface flows. The 2024–2026 wave of customisable AI personas and instruction-following customisation (ChatGPT's custom instructions, Claude's response-style controls, the various local-LLM customisation patterns) has substantially improved this layer.

Universal design as methodology

The conceptual frame for inclusive AI design is universal design: designing systems to work well for the widest reasonable range of users, rather than designing for a default user and adapting at the margins. The methodology connects to Section 3's interface patterns (which patterns work across diverse users), Section 5's trust calibration (which calibration messages are interpretable across user groups), and Section 6's transparency (which explanations work for users with different cognitive styles). Universal-design principles are not free — they require explicit prioritisation and budget — but the empirical evidence is that products designed with accessibility from the start are also more usable for users without disabilities, and the upfront cost is substantially lower than retrofitting accessibility later.

Evaluation, Ethics, and the Frontier

The previous sections developed the methodology of HAI; this final section turns to the evaluation of that methodology, the ethical dimensions of AI-mediated interaction, and the open frontiers of the field.

UX research methods for AI

Classical UX research methods — usability testing, heuristic evaluation, contextual inquiry, longitudinal diary studies — apply to AI interfaces but require methodological adaptation. Usability testing for AI products has to handle non-deterministic outputs (the same task may produce different AI responses across sessions), which complicates standard task-completion-rate metrics. Heuristic evaluation needs AI-specific heuristics (Microsoft's "Guidelines for Human-AI Interaction" from Amershi et al. 2019 are the most-cited starting point). A/B testing faces the standard recommender-systems issue (Part XIV Ch 01) of feedback loops that confound measurement. The 2020s wave of AI-product research has produced substantial methodological literature; the core empirical disciplines remain valuable but require adaptation.

Manipulation, persuasion, and dark patterns

AI systems can be designed to manipulate as well as inform. Recommender systems can maximise engagement at the cost of user wellbeing (the classical critique of social media); chatbots can use persuasive language to nudge user behaviour (the various 2024 studies of LLM persuasiveness in political and commercial contexts); agentic systems can take actions the user did not consciously authorise. The methodology of ethical AI design involves explicit boundary-setting against manipulation: design audits that test for dark patterns, alignment with the user's actual interests rather than measured engagement, transparency about system objectives, and increasingly regulatory frameworks (the EU AI Act's prohibition on manipulative AI, the various FTC enforcement actions on deceptive AI marketing). The boundary between persuasion (acceptable) and manipulation (not) is contested, but the discipline of taking the question seriously is part of mature HAI practice.

Companionship, attachment, and the social-AI question

A specific frontier worth flagging: AI systems that present as companions or relationship partners. Replika, Character.AI, the various 2024–2026 AI-companion startups, and increasingly mainline chat products operate in this space. The empirical evidence is mixed: some users report substantial benefit (reduced loneliness, emotional support, practice for difficult conversations); others show signs of unhealthy attachment, displaced human relationships, and emotional manipulation. The 2024 wave of regulatory attention (Italian Replika bans, the various lawsuits on AI-induced harm to minors, the FDA's preliminary attention to AI-based mental-health products) suggests this frontier will be substantially more regulated by 2028. The methodology of responsible companionship-AI design is in early development, with the major tensions between user autonomy, vulnerable-user protection, and commercial pressure not yet resolved.

Long-term cognitive effects

The longest-running open question in HAI is the effect of sustained AI use on human cognition. Calculator-and-arithmetic studies from the 1970s, GPS-and-spatial-memory studies from the 2010s, and increasingly the 2024–2026 wave of LLM-and-writing studies (the MIT studies on student writing, the various cognitive-offloading studies) all show patterns of skill atrophy when humans systematically delegate cognitive tasks to tools. The patterns are not uniformly negative — humans freed from arithmetic drudgery built more sophisticated mathematics — but they are real, and the design implications are unsettled. Should AI interfaces actively resist over-delegation? Should they include "skill-building" modes that preserve the underlying capability? The empirical evidence is too early to answer, and the methodology is an active research frontier.

What this chapter does not cover

Several adjacent areas are out of scope. The substantial HCI literature on multi-modal interaction (haptics, AR/VR, brain-computer interfaces) intersects HAI but is its own discipline. The branding-and-emotional-design literature (how products feel) is closely related but conventionally treated through marketing rather than HCI. Game design — which has its own deep methodology of player engagement, progression, and reward systems — increasingly informs AI-product design but is its own substantial field. And the philosophical literature on AI consciousness, moral status, and the proper regard humans should hold for AI systems is essential context for several of the design questions of this section but is its own substantial enquiry that the chapter touches only briefly. The methodology developed here is the practical UX discipline of building AI products that work for users; the deeper questions of what AI is and how humans should relate to it are taken up elsewhere in the compendium.