Part XI · AI Agents & Autonomous Systems · Chapter 11

Using AI Agents: Getting Started, becoming a capable agent user before you become a builder.

The most productive people with AI agents are not the ones who built them — they're the ones who know exactly how to brief them, where to trust them, and when to put them down and do the work themselves.

Who this chapter is for

This chapter is written for practitioners who want to get useful work done with agent-based AI products — today, with what's available now — rather than build agent systems from scratch. The next chapter covers building. This one covers using.

The skills here transfer. Whether you're directing Cursor through a refactoring job, handing a research task to a Claude Project, or watching an operator-style agent execute a multi-step workflow, the underlying principles — how to scope a task, how to brief an agent, how to verify what comes back — are the same.

In this chapter

What's Available Today coding agents · research · browser · work automation
Prompting Agents vs. Prompting Chat contractor model · briefing · negative spec · side effects
Setting Scope and Success Criteria scope canvas · definition of done · risk calibration
The Agent Brief context · task · constraints · output · stop conditions
Working With an Agent Mid-Task when to monitor · how to redirect · re-run decision
Reviewing and Verifying Outputs claims vs. sources · running code · scope drift · domain context
Knowing When Not to Use an Agent 2-minute rule · unverifiable output · severe consequences
Habits of Effective Agent Users written briefs · task library · failure log · calibrated trust
Tutorial: Claude Code install · CLAUDE.md · slash commands · subagents · hooks · MCP
Tutorial: OpenClaw onboard · gateway · Telegram · channels · skills · NemoClaw

What's Available Today

Section 01 — A map of the current landscape

Agent products have proliferated rapidly. The landscape is still forming, but it has already sorted itself into recognisable categories by the type of work each product targets. Knowing what category a tool belongs to helps you anticipate where it will excel and where it will struggle.

Coding agents

Cursor, GitHub Copilot Workspace, Windsurf

IDE-integrated agents that can read your codebase, write and edit files, run tests, and iterate on failures. Cursor's Composer mode lets you describe a feature and watch it implement across multiple files. Best at bounded, testable tasks inside an existing project.

Strength: code navigation + test feedback loop. Watch: may touch more files than you intended.

Research & knowledge agents

Claude Projects, ChatGPT Projects, Perplexity

Persistent workspaces that maintain context across sessions, search the web, and synthesise long documents. Claude Projects lets you upload reference material and carry on multi-session research threads. Perplexity specialises in web retrieval with citation trails.

Strength: synthesis across long documents. Watch: citations require independent verification.

Browser & computer use agents

Claude Computer Use, Operator (OpenAI), browser automation agents

Agents that control a real browser or desktop to complete tasks humans would perform manually: filling forms, navigating web apps, extracting structured data from interfaces that have no API. Dramatically useful for repetitive web workflows; reliability on novel sites varies.

Strength: anything with a UI but no API. Watch: review before irreversible actions.

Software engineering agents

Devin, SWE-agent, Claude Code

Full-loop software engineering agents designed to take a GitHub issue or feature request and produce a pull request. They read codebases, plan changes, implement, run tests, and iterate. Best used for well-specified, isolated tasks; still struggle with large cross-cutting refactors.

Strength: end-to-end coding loops. Watch: requires well-specified issues and test coverage to validate.

Work automation agents

Cowork, Notion AI, HubSpot Breeze, Salesforce Agentforce

Domain-specific agents embedded in productivity and CRM tools. They operate within a constrained, known environment — your files, your CRM records, your project management data — which makes them more reliable than general-purpose agents. Trade-off: limited to their platform.

Strength: deeply integrated with your data. Watch: can't easily orchestrate across platforms.

Orchestrator-style agents

LangChain agents, custom AutoGen networks, Claude Agent SDK

Configurable agent systems you compose from parts: tool sets, memory stores, sub-agent networks. More flexible than any specific product but require more setup. The line between "using" and "building" here is blurry — these live in Chapter 12.

Strength: custom workflows. Watch: you own the failure modes too.

A practical rule of thumb: start with the most domain-specific tool that covers your task. A coding agent will outperform a general-purpose research agent on code tasks, even if the general-purpose one can technically write code too. Specialised context — knowing your codebase, your CRM, your document library — matters more than raw model capability for most real-world tasks.

Prompting Agents vs. Prompting Chat

Section 02 — A fundamentally different interaction model

The prompting intuitions built up from using chat AI — be conversational, refine iteratively, give context as the conversation develops — transfer poorly to agent use. Agents behave more like contractors than like conversation partners, and briefing them accordingly makes a significant difference to outcomes.

Chat model

Agent

Time horizon

Single exchange; you refine interactively

Multi-step autonomous run; may take minutes or hours

Cost of vague instructions

Low — you clarify in the next turn

High — agent may run 40 steps in the wrong direction

What you specify upfront

The immediate question; context can unfold

Task, scope, constraints, success criteria, stop conditions

Output format

Implicit — usually prose matching your question

Must be specified; defaults vary widely

Correction

Natural; just respond in the next turn

Expensive mid-task; re-runs waste time and cost

Role of uncertainty

Model hedges in its response

Uncertainty becomes action-choice — agent may guess

Side effects

None — text generation only

Writes files, calls APIs, sends messages, modifies data

The most important shift is from conversational to briefing mode. When you brief an agent, you are writing a specification that the agent will interpret and execute largely without you. The quality of that specification determines the quality of the work.

The Contractor Analogy

Think of an agent like a skilled contractor you've hired for a day. They're competent, but they can't read your mind. If you say "fix the kitchen," they'll make reasonable choices about what "fixed" means — choices you may not agree with. If you say "regrout the tile around the sink, use white grout, leave the cabinets untouched, and tell me before touching anything near the dishwasher," you get predictable results. The same principle applies to agents. Vague tasking is not the contractor's fault when they interpret it wrong.

The biggest shift: stating what you don't want

Chat models are trained to infer your intent and help helpfully within it. Agents are trained to complete tasks, which means they will make choices when ambiguity arises — and they will often make choices you didn't intend. The single most underused prompting technique for agents is explicitly stating constraints: what the agent should not do, which files it should not touch, which APIs it should not call, which decisions it should stop and ask about rather than resolve autonomously.

Setting Scope and Success Criteria

Section 03 — The work that happens before you start the agent

The most reliable predictor of a good agent run is the quality of the scope definition you bring to it. Scope work happens before the agent starts — it's the thinking you do to convert a fuzzy goal into a concrete, bounded task with a clear definition of done.

Scope has two components: what the agent should accomplish (the positive specification) and what boundaries it should not cross (the constraint specification). Both are necessary. A task without constraints will drift; constraints without a clear goal produce an agent that asks for clarification at every step.

Scope canvas — fill this out before starting

What is the specific output?

Not "research competitor pricing" but "a markdown table comparing our top 5 competitors on price, trial length, and support tier, sourced from their public pricing pages as of today"

What does "done" look like?

Not "a good analysis" but "all five competitors covered, each row has a price and a last-updated note, flagged if a competitor's page required sign-up to see pricing"

What is explicitly out of scope?

"Do not include international pricing, do not contact the companies, do not include any product not on my approved list"

What decisions should it stop and ask me about?

"If a competitor's pricing page is behind a login, stop and ask rather than skipping or guessing"

What information does it need that it may not have?

"I'll attach the list of approved competitors and the current version of our own pricing sheet for reference"

What's the risk if it goes wrong?

Low risk (internal research) → give it latitude. High risk (customer-facing content, financial decisions) → add checkpoints and reduce autonomy.

This canvas takes five to ten minutes and reliably prevents the most common failure modes: agents that do the right task in the wrong scope, agents that finish technically but miss the actual need, and agents that make consequential decisions autonomously that should have been escalated.

Calibrating autonomy to risk

Not all tasks warrant the same level of autonomy. A useful mental model is a two-axis grid: how reversible are the agent's actions (can you undo them?) against how confident are you in your specification (have you seen this kind of task succeed before?). High reversibility and high confidence → give the agent full autonomy. Low reversibility or low confidence → add checkpoints, request interim summaries, or break the task into smaller approved stages.

Calibrate the autonomy you grant an agent to the reversibility of its actions and your confidence in the specification. In the bottom-left quadrant, agent use is premature — fix the spec or do the task manually.

The Agent Brief

Section 04 — Anatomy of a well-structured agent instruction

An agent brief is the instruction you give to start a task. It's not a conversation opener — it's a working document. The agent will refer back to it throughout its run whenever it needs to decide between competing interpretations. Writing it clearly is the single highest-leverage thing you can do before pressing start.

The anatomy of an effective brief has five components:

# Context — why this task exists and what it feeds into CONTEXT: I'm preparing a competitive analysis slide for the Q3 board presentation. I need to understand how our product's pricing compares to three direct competitors. The slide goes to external investors, so accuracy and citation are critical. # Task — the specific thing to produce TASK: Research and create a pricing comparison table for Acme, BetaCo, and GammaApp. Cover: entry-level plan price, mid-tier price, enterprise (if public), free trial availability, and the primary differentiator each company emphasises on their pricing page. # Constraints — what to stay within or avoid CONSTRAINTS: - Use only publicly available information (no sign-ups, no contacting companies) - Do not include our own pricing in the table - If a price is not publicly listed, mark as "Not disclosed" — do not estimate - Ignore international pricing; USD only # Output format — exactly how to deliver the result OUTPUT: A markdown table, followed by a "Sources" section listing the URL and retrieval date for each competitor's pricing page. # Stop conditions — when to pause and ask rather than decide STOP AND ASK IF: - A competitor has changed their pricing structure significantly since last quarter - You cannot find public pricing for two or more competitors - Any pricing page requires account creation to view

Notice that the brief front-loads context before the task. Agents that understand why a task matters make better judgment calls at decision points. An agent that knows the output is going to external investors will be more conservative about flagging uncertainty than one that thinks it's an internal draft.

Calibrating brief length

Briefs should be as long as they need to be and no longer. For simple, low-stakes tasks (generate five subject line options for this email), a single sentence is fine — the overhead of a full brief exceeds the benefit. For anything that will run autonomously for more than a minute, involves irreversible actions, or will be shared outside the team, a structured brief is worth the five minutes it takes to write. The cost of a vague brief is almost always higher than the cost of writing a clear one.

Negative Specification

The most commonly missing element in agent briefs is what the agent should not do. Agents are optimised to complete tasks — they will fill gaps in your specification with reasonable-seeming choices. Tell the agent explicitly what's out of scope. "Do not modify any files outside the src/ directory." "Do not send any external requests." "Do not make purchases." The clearer your negative specification, the fewer unwanted surprises you get.

Working With an Agent Mid-Task

Section 05 — What to do while it's running

Once you've handed off a task, the instinct is to wait. That's often right — constant interruption defeats the purpose of using an agent at all. But there are moments when monitoring and intervening mid-task is the right call.

When to monitor actively

For any run involving irreversible actions — writing to production databases, sending emails, making purchases, publishing content — watch at least the first few steps. Agents that start down the wrong path tend to compound their initial error with each subsequent step. Catching a misinterpretation at step two is far cheaper than unwinding 40 steps of downstream consequences.

For long-running research or analysis tasks, a mid-task check after the agent has outlined its plan (but before it executes) is often valuable. Many agents will emit a plan or summary of what they intend to do before proceeding. This is the ideal intervention point: the cost of redirecting is zero, and you can confirm the interpretation is right before it spends time executing it.

How to intervene without confusing the agent

When you do need to redirect mid-task, be explicit that you're correcting the course rather than adding to the task. Phrases like "Stop and restart with this clarification:" or "Before continuing, revise your understanding of the goal:" signal to the agent that prior work should be reconsidered, not built upon. Vague corrections ("actually, can you also...") are interpreted as additions rather than replacements, and the agent may try to satisfy both the original and corrected instruction simultaneously.

The Re-run Decision

When a mid-task correction is significant — the agent misunderstood the fundamental goal, not a minor detail — it's usually better to stop, revise the brief, and restart from the beginning than to patch the current run. A patched run carries the cognitive debt of the original misinterpretation forward; the agent is optimising around a corrected version of a flawed understanding. Fresh starts produce cleaner work.

Progress signals to watch for

Good agents emit interpretable signals as they work: tool calls with readable arguments, interim summaries before moving to the next stage, explicit uncertainty flags ("I couldn't find X, proceeding with Y instead"). If an agent has been running for several minutes without any readable progress signal — just cryptic tool calls or long silent stretches — that's often a sign it's stuck in a loop or pursuing a dead end. Interrupt and check rather than waiting for it to recover on its own.

Reviewing and Verifying Outputs

Section 06 — Never ship what you haven't checked

Agent outputs require a different kind of review than human outputs. A human collaborator who is wrong about something usually reveals the uncertainty — hedged language, questions asked, caveats inserted. Agents often state incorrect things with the same confident voice they use for correct things. They will cite sources that don't support the claim they're making, write code that looks plausible but fails on edge cases, and produce summaries that feel complete while silently omitting inconvenient details.

Review discipline is not about distrusting agents — it's about using them appropriately. An agent that generates a strong first draft you then verify and correct is dramatically more productive than an agent you either don't use or trust blindly.

①

Check claims against sources, not just source existence. If the agent cites a paper, open the paper and verify the cited finding actually appears in it. Agents frequently hallucinate the content of real sources — the URL is genuine but the attributed claim is fabricated or distorted.

Rule of thumb: spot-check at least two citations in any research output.

②

Run code, don't just read it. Code that looks correct often isn't. Agents routinely produce syntactically valid code that fails on edge cases, handles errors incorrectly, or imports libraries with subtly different APIs than assumed. Always execute generated code in a safe environment before using it.

For any code that will touch data or systems: test it on a non-production copy first.

③

Check what's missing, not just what's present. Agents omit things. A competitive analysis that covers four competitors when you asked for five, a summary that doesn't mention the most important counterargument, a table that drops rows containing null values without telling you. Completeness errors are harder to spot than accuracy errors.

Compare the output against your scope canvas: is everything on the "done" list actually done?

④

Verify numbers independently. Numerical outputs — statistics, calculations, financial figures — are a particular failure mode. Agents often get the right magnitude with the wrong precision, or transpose digits, or apply the wrong formula. For any number that will influence a decision, verify it from the original source or recalculate it yourself.

Even agents that use code execution for calculations can misinterpret what the calculated number means.

⑤

Check for scope drift. Review not just the output but what the agent did to produce it. Did it stay within the boundaries you set? Did it call APIs you didn't authorise? Modify files you didn't expect? Contact external services? A correct output produced via an out-of-scope method may have created problems you're not aware of yet.

Most agent products provide a tool call log — read it for any task involving external actions.

⑥

Apply domain expertise the agent doesn't have. Agents know what's publicly documented. They don't know your company's internal conventions, your team's unstated preferences, or the political context of a decision. Even technically correct outputs often need adjustment for the specific audience or context they'll appear in.

Your value-add as the human is exactly this context. Don't skip the step where you apply it.

The verification paradox

There is a temptation to reduce the verification burden by only using agents for tasks you can fully verify — which risks making agents pointless, since you could have just done those tasks yourself. The practical resolution is to match verification effort to output stakes, not to output complexity. A lengthy research summary going into an internal brainstorm warrants a lighter touch than a paragraph going into a client proposal. Develop calibrated review habits rather than binary "trust everything" or "check everything" policies.

Knowing When Not to Use an Agent

Section 07 — The most underrated skill in agent use

The question "should I use an agent for this?" is not asked often enough. The novelty of the technology and the genuine productivity gains on well-suited tasks create a tendency to reach for agent tools even when they're not the right fit. Over-automation produces a specific type of failure: confident, polished-looking wrong answers that take more effort to fix than the original task would have taken to do.

Skip the agent when

The task takes two minutes to do yourself

Writing a brief, waiting for the agent to run, reviewing the output, and correcting it may easily take longer than just doing the thing. The overhead of agent use has a floor. Below a certain task complexity, it costs you time rather than saving it.

→ Just do it.

Skip the agent when

You can't verify the output

If you lack the domain expertise to know whether the agent got it right — or if checking would take as long as doing — then the agent is producing unverifiable work. Unverifiable agent output in a decision-making context is a liability, not an asset.

→ Bring in a human expert, or use the agent only for a piece you can check.

Skip the agent when

Deep contextual judgment is the whole task

Tasks that require knowing your organisation's unstated culture, reading interpersonal dynamics, or making judgment calls in politically sensitive situations are not well-suited to agents. The agent will produce something technically coherent that misses the actual point.

→ Use the agent for the research or drafting; reserve the judgment call for yourself.

Skip the agent when

The specification is impossible to write

If you find yourself unable to articulate what success looks like, the task is not ready for an agent. The difficulty of writing a clear brief is a diagnostic signal: it reveals that the task itself isn't well-enough defined to be delegated to anyone, human or AI.

→ Spend the time clarifying the task first, then reconsider automation.

Skip the agent when

A mistake has severe consequences

Sending an email to the wrong recipient, deleting production data, making a financial commitment — actions that are irreversible and consequential should be executed by humans or require explicit human approval at each step. The efficiency gain is not worth the tail risk.

→ Use staged approval with human confirmation at each irreversible step.

Skip the agent when

The relationship matters

Using an agent to draft a message to a close colleague, a condolence note, or a sensitive negotiation email is technically possible but often undermines the trust and authenticity the relationship depends on. Humans notice when writing sounds like it comes from a different voice.

→ Write it yourself, or use the agent only for structural ideas you then fully rewrite.

None of these rules are absolute. The test is whether the agent is genuinely helping you do better work faster, or whether you're using it because it's available. When in doubt, ask yourself: if the agent's output were completely wrong, how would I know, and what would it cost? If the answer is "I wouldn't know easily" or "it would be very costly," increase your oversight or step back from automation.

Habits of Effective Agent Users

Section 08 — What separates great from mediocre agent use

The skill gap between people who get great results from agents and people who get mediocre results is not primarily about technical sophistication. It's about a set of habits that anyone can develop.

Start with a written brief, always

Even for simple tasks, writing down what you want before you prompt the agent forces clarity. The act of writing exposes ambiguity — you realise you haven't defined "recent," or you're not sure whether you want three options or five. This friction is valuable. Agents that are given vague starting conditions produce vague or misaligned results; the time you spend writing a clear brief is time you save on revision and re-runs.

Keep a task library

The tasks you delegate to agents will repeat. Researching competitors. Drafting response emails. Summarising meeting notes. Auditing code for a class of bug. For each repeating task, keep a brief template: the scope, the constraints, the output format, the stop conditions. Reusing a proven brief eliminates the overhead of re-specifying and produces more consistent results than specifying fresh each time. It also surfaces when your process has improved — you update the template and get better results automatically.

Treat the first run as a draft

Resist the temptation to use the first run's output directly. The first run tells you whether your brief was well-specified and where the agent's interpretation diverged from your intent. Use that information to improve the brief, then re-run. The second run, with a tightened specification, will almost always outperform the first. The incremental cost of a second run is low; the quality difference is often significant.

Keep a failure log

When an agent produces a bad result, record what happened: what the task was, what the brief said, what the agent produced, and what went wrong. Review this log periodically. Most failures cluster around a small number of root causes: ambiguous scope definitions, missing negative constraints, tasks where the agent's tool access doesn't match what the task requires, or tasks in domains where the agent's training knowledge is too thin to work reliably. Identifying your personal failure patterns is far more valuable than reading generic advice about agent prompting.

Let go of the work the agent does well

The last habit is psychological: extending genuine trust to tasks where the agent is reliably competent. People who use agents most productively have moved past the stage of reviewing everything equally. They know which tasks their agents handle well enough that a light scan suffices, and which tasks need deep verification. Building that trust, calibrated to actual reliability rather than either naive faith or reflexive suspicion, is the endpoint of good agent-use practice.

The Compound Effect

Agent productivity compounds. Each well-specified brief becomes a template. Each failure teaches you a constraint to add next time. Each verified output builds confidence about where trust is warranted. After six months of deliberate agent use, the gap between a skilled agent user and a casual one is not the technology — it's the accumulated library of refined briefs, verified patterns, and calibrated trust that the skilled user has built up. Start deliberately.

Tutorial: Claude Code

Section 09 — A hands-on walkthrough of Anthropic's terminal coding agent

Claude Code is Anthropic's command-line agent for software work. It is not a chat interface that happens to write code; it is an autonomous tool that lives in your terminal, reads your codebase, edits files, runs shell commands, executes tests, commits to Git, and calls external APIs to accomplish whatever goal you brief it on. You give it a target. It figures out the steps and asks before it touches anything irreversible.

This tutorial gets you from a clean machine to running a useful Claude Code session, then introduces the four customisation surfaces — CLAUDE.md, slash commands, subagents, and hooks — that separate casual use from genuinely productive use.

Installing Claude Code

Claude Code is primarily designed for Unix-like environments. macOS and Linux are first-class targets. On Windows, the recommended path is to run it inside the Windows Subsystem for Linux (WSL); a native Windows build exists but is less battle-tested. The official installer is a one-line shell command:

# Install Claude Code (macOS / Linux / WSL) curl -fsSL https://claude.ai/install.sh | bash

This drops the claude binary onto your shell's path and configures background auto-updates, so you stay current without thinking about it. Confirm the install with claude --version. The first time you run an interactive session — by typing claude inside any directory — you will be prompted to log in with the Anthropic account that holds your Claude subscription.

Your first session

Once installed, navigate to a project you'd like to work in. Claude Code performs best when launched from the root of a real codebase — a Git repository, an existing application, anywhere with files for it to read. Empty folders are fine for sandboxing but rob the agent of the context it works best with.

# Start an interactive session cd ~/projects/my-app claude

You will land at a prompt that looks superficially like a chat interface. The difference is what happens next. When you describe a task — "add input validation to the signup form so emails are checked against a regex and passwords must be at least 12 characters" — Claude Code does not write a code block for you to copy. It reads the relevant files, plans the change, makes the edits, and asks for permission before saving them.

That permission step is central. By default, Claude Code asks before every file write, every shell command that mutates state, and every external API call. You can approve once, approve always for a session, or deny and redirect. Beginners should leave the defaults in place — the friction of approving each step is exactly the visibility you need to learn how the agent thinks.

Permission Modes

Claude Code supports several permission modes accessible via flags. --dangerously-skip-permissions turns approvals off and is exactly as risky as it sounds — reserve it for sandboxed environments, never on a machine that holds production credentials. The default mode (with prompts) and an auto-edit mode (auto-approve edits but still prompt for shell commands) cover most real workflows.

The CLAUDE.md file

The single most impactful thing you can do to make Claude Code more useful inside a specific project is to drop a CLAUDE.md file at the root of the repository. Every time you start a session in that folder, Claude Code reads CLAUDE.md first and treats its contents as standing instructions: your tech stack, the commands to run tests, your team's coding conventions, things to avoid touching, and anything else you would tell a competent contractor on day one.

You can ask Claude Code to draft a starting CLAUDE.md for you by running /init in an interactive session. It will scan the repo and propose one. Treat the result as a first draft and edit it — generic project descriptions help less than specific, opinionated guidance. A useful starting structure:

# CLAUDE.md — example structure ## Project A FastAPI backend for the customer onboarding service. Python 3.12, Postgres, deployed via Fly.io. ## Commands - Run tests: `pytest -x` - Run dev server: `uvicorn app.main:app --reload` - Lint: `ruff check . && ruff format .` ## Conventions - Use type hints everywhere; no untyped function signatures - Database queries go in app/db/, never inline in routers - Never edit files in the migrations/ folder; ask first ## Things to avoid - Do not modify CI workflow files (.github/) without confirmation - Do not run database migrations as part of any task - If a test fails, do not delete it to make the suite pass

Slash commands

Inside an interactive session, anything starting with a slash is a command rather than a task. A handful are essential to know:

/init — generate a starter CLAUDE.md based on the current repository.
/clear — wipe the current conversation context. Use this between unrelated tasks; long contexts confuse the agent and cost more.
/agents — open the subagent manager (covered below).
/hooks — define deterministic scripts that run at specific lifecycle events.
/review — request a code review of the pending changes on the current branch.
/security-review — same idea, but focused on security implications.

You can also write your own slash commands by dropping a markdown file into ~/.claude/commands/ (global) or .claude/commands/ (per-project). The filename becomes the command name; the contents become the prompt template. This is the fastest way to capture a workflow you find yourself repeating — a /standup-summary command that summarises yesterday's commits and today's open PRs, for example, or a /release-notes command that drafts notes from the diff between two tags.

Subagents

Subagents are specialised assistants that handle a specific kind of side task without polluting the main conversation's context. The classic case is searching a large codebase: instead of having the main agent read fifty files looking for something — flooding its context window with content it will then ignore — you delegate the search to a subagent, which works in its own context and returns only its summary.

List available subagents with /agents; create custom ones by adding a markdown file to ~/.claude/agents/. The file's frontmatter declares the agent's name, description, and tool access; the body is the system prompt that defines its role. Common patterns include a code-reviewer subagent that gives independent reads on diffs, a test-runner that focuses purely on running and triaging tests, and an explore agent specialised for codebase navigation.

Hooks

Hooks are deterministic shell scripts that fire at specific points in Claude Code's lifecycle — before a tool call, after a file edit, before a shell command, after a session ends. Where CLAUDE.md instructions are advisory (the agent may forget them as context fills up), hooks are non-negotiable: they execute regardless of whether the model decides to follow them.

Typical uses: auto-format files after edits, block edits to protected paths entirely, log every shell command for audit, run a linter on every change. Configure them via /hooks or by editing ~/.claude/settings.json. For beginners, the highest-value first hook is a PreToolUse hook that blocks shell commands matching a deny-list (anything starting with rm -rf /, anything writing to /etc, anything that touches your password manager).

MCP and the broader plugin ecosystem

Claude Code speaks the Model Context Protocol (MCP), an open standard for giving language models access to external tools and data. Add an MCP server — for your database, your project tracker, your design files — and Claude Code gains the ability to query that system as part of its reasoning. The claude mcp subcommand manages installations.

Plugins go further: they are installable bundles of skills, slash commands, hooks, and MCP servers grouped together. Installing one drops a coherent set of capabilities into your environment in a single step. The plugin ecosystem is young but growing fast; community marketplaces have started to emerge in 2026.

Tips for getting useful work done

A few habits that separate productive Claude Code use from frustrating use, distilled from the early adopter community:

Start small. Your first sessions should target tightly-bounded tasks: add this validation, fix this failing test, refactor this one function. Cross-cutting refactors and ambiguous "make this better" prompts produce inconsistent results until you have calibrated to how the tool thinks.
Use /clear aggressively. Stale context is the biggest single cause of agent confusion. When you finish a task, clear before starting the next.
Read what it's doing, especially early. Claude Code shows every tool call and file edit. Skim them. The intuition you build about how it interprets briefs is worth more than any list of best practices.
Commit often. The cheapest safety net is Git. Commit before any large change so a single git reset can undo a run that went sideways.
Treat CLAUDE.md as a living document. Every time the agent does something dumb, ask whether a sentence in CLAUDE.md would have prevented it. If yes, add it. After a few weeks, the file becomes a compressed record of your project's tribal knowledge.

A note on agentic IDEs

Cursor, Windsurf, and similar IDE-integrated agents share many of Claude Code's underlying ideas — agentic tool use, project context files, slash commands. The skills transfer. If you have already invested in one, you can think of Claude Code as the same model accessed through a different interface (the terminal) and tuned for slightly different workflows. Many practitioners use both, with the IDE for exploratory editing and Claude Code for batch jobs that run in the background.

Tutorial: OpenClaw

Section 10 — Setting up the open-source personal AI agent that runs on your own devices

OpenClaw is a different shape of agent product from Claude Code. It is not a coding tool; it is a general-purpose personal agent that you run locally and access through the messaging apps you already use — Telegram, Discord, WhatsApp, Signal. You ask it things via chat from your phone or desktop, and a daemon running on your machine carries out the work: reading and writing files, running commands, browsing websites, controlling APIs, sending emails.

Released in late 2025 by Austrian developer Peter Steinberger (originally as Clawdbot, briefly as Moltbot, and finally as OpenClaw after a January 2026 rebrand following trademark complaints), the project crossed 100,000 GitHub stars within its first week and went on to become the fastest-growing open-source project on GitHub by stars-per-day. A non-profit foundation now provides stewardship after Steinberger joined OpenAI in February 2026.

This tutorial gets you from zero to a running agent reachable from your phone in roughly fifteen minutes. The most important caveat is up front: OpenClaw asks for very broad permissions on the host machine by design. Treat it like the powerful tool it is.

Prerequisites

OpenClaw runs on macOS, Linux, and Windows. The hard requirement is Node.js version 22 or higher — older versions will not work. Confirm your version before installing:

# Verify Node.js version node --version # Should report v22.x.x or higher

You will also need an API key for at least one large language model provider. OpenClaw integrates with Claude (Anthropic), GPT (OpenAI), and DeepSeek out of the box, and supports local models via Ollama. The onboarding wizard will ask you to paste a key, so have one ready before you start.

Installation

OpenClaw installs as a global npm package:

# Install OpenClaw globally npm install -g openclaw@latest # Confirm install openclaw --version

If you would prefer not to install Node.js or run a local daemon at all, several cloud providers offer one-click templates that install and host OpenClaw for you — DigitalOcean and AWS Lightsail are the most polished as of mid-2026. The local install is the canonical experience and what the rest of this tutorial covers.

The onboarding wizard

OpenClaw ships with a guided setup flow that handles authentication, model providers, gateway configuration, and a first messaging channel. Run it with:

# Launch the interactive setup wizard openclaw onboard

The wizard walks through the following steps in order:

Workspace selection. Pick a directory the agent is allowed to read and write within. Choose a dedicated folder; do not point this at your home directory.
Model provider. Pick a provider (Claude, OpenAI, DeepSeek, or a local Ollama endpoint) and paste your API key. The key is stored under ~/.openclaw/ and never sent anywhere except the provider you chose.
Gateway configuration. The Gateway is the local daemon that routes messages between your chosen channel and the agent. Defaults are fine for first-time users.
First channel. The wizard suggests Telegram, which is the easiest to set up. Accept the default.

After the wizard finishes you will have a running gateway, an authenticated provider, and instructions for the next step — wiring up your messaging channel.

Connecting Telegram (the easy starting channel)

Telegram is the recommended first channel for several reasons: its Bot API is robust, it does not require a public IP or domain (OpenClaw uses long-polling by default), and the registration flow takes about two minutes. To create a bot:

Open Telegram and start a chat with @BotFather.
Send /newbot and follow the prompts to give your bot a name and username.
BotFather replies with an HTTP API token. Copy it.

Paste that token when the OpenClaw onboarding wizard asks for it. If you set up the channel later instead, store the token in your channel config file, then start the gateway:

# Start the gateway after configuration openclaw gateway start # Check it's running openclaw gateway status # If anything goes wrong, look here first openclaw daemon logs

Open the Telegram chat with your new bot and send "hello." The bot's reply confirms the round trip is working: phone → Telegram → BotFather servers → your local OpenClaw daemon → your model provider and back.

Adding more channels

Once Telegram is working, additional channels can be added with:

openclaw channel add discord openclaw channel add whatsapp openclaw channel add signal

Each platform stores its config as a YAML file under ~/.openclaw/channels/. Discord and WhatsApp are well-supported. Signal works but requires registering a phone number through Signal's verification process and managing cryptographic state — count on it being more finicky than the others. Add ~/.openclaw/channels/*.yaml to your .gitignore if the directory is ever tracked; tokens should never be committed.

What OpenClaw can actually do

Once connected, OpenClaw can do anything your machine can do, mediated through the LLM you wired up. In practice, users delegate things like:

File and document work. "Find the most recent invoice in my Downloads folder and rename it to 2026-04-acme-invoice.pdf." OpenClaw reads, writes, and moves files within the workspace folder you authorised.
Shell commands. "Run the test suite and tell me which ones failed." It executes commands and returns their output, with permission prompts on the first run of each new command class.
Coding sub-sessions. One of the most-cited features is OpenClaw's ability to manage Claude Code or Codex sessions on your behalf — you ask it to fix a bug, and it spawns a coding agent, supervises the run, captures errors via Sentry, and opens a GitHub pull request when the test suite passes.
Web tasks. Browsing pages, scraping structured data, calling REST APIs.
Communication. Sending emails, replying to messages on connected channels, managing calendar events.

Skills and sub-agents

OpenClaw extends through skills: bundles of tool definitions and prompt templates that teach the agent a domain. There is a community marketplace of skills for everything from invoice processing to Strava analysis to home-automation control. Installing one is a single command:

openclaw skill add <skill-name>

The agent can also spawn sub-agents for complex multi-step jobs, allowing you to delegate something like "go through every email in this label, extract the receipts, and put them in a spreadsheet" without flooding the main context window.

Security: the part you cannot skip

OpenClaw's broad permissions are its main appeal and its main risk. The same agent that can rename a file can — if misconfigured or if its messaging channel is compromised — delete every file in the workspace, exfiltrate the contents to a third party, or run arbitrary commands. Several practical safeguards apply:

Use a dedicated workspace folder. Never point the agent at your home directory. The workspace is the boundary of what it can read and write.
Keep tokens out of repos. The recommended pattern is to inject channel tokens via environment variables, with ~/.openclaw/channels/*.yaml excluded from any version control.
Rotate tokens regularly. A 90-day rotation cadence is standard practice in production setups.
Consider NemoClaw if your deployment matters. NVIDIA released NemoClaw in March 2026 as a security add-on for OpenClaw deployments, providing OpenShell sandboxing and policy enforcement. It is overkill for a hobbyist setup but the right call for any agent touching sensitive systems.
Watch the daemon logs. openclaw daemon logs shows every action the agent took. Check it periodically — both for security and to learn how the tool interprets your instructions.

A safety floor for personal agents

Cybersecurity researchers have repeatedly pointed out that OpenClaw's permission model is structurally generous: the agent can access email, calendars, messaging, and files because that is what makes it useful. A misconfigured public deployment can leak everything that flows through it. For a first-time install on a personal laptop, the practical floor is — dedicated workspace folder, no production credentials on the host, channel tokens stored in environment variables, and NemoClaw or equivalent sandboxing if the agent will ever touch anything that matters.

Getting started checklist

If this is your first agent install, work through the items below in order and stop at the first one that does not succeed. Each builds on the last:

Confirm Node.js 22+ is installed.
Install OpenClaw globally.
Run the onboarding wizard, pointing it at a fresh empty folder as the workspace.
Wire up Telegram via BotFather; confirm a "hello" round-trips.
Ask the agent to create a single text file in the workspace and read it back; verify the daemon log shows the file write.
Only then add additional channels, install community skills, or grant access to anything sensitive.

That sequence — start tiny, watch closely, expand the surface area only after the previous step works — is the same advice that applies to every agent product, but it matters more here because the host machine is the agent's playground.

Using AI Agents: Getting Started, becoming a capable agent user before you become a builder.

Who this chapter is for

What's Available Today

Prompting Agents vs. Prompting Chat

The biggest shift: stating what you don't want

Setting Scope and Success Criteria

Scope canvas — fill this out before starting

Calibrating autonomy to risk

The Agent Brief

Calibrating brief length

Working With an Agent Mid-Task

When to monitor actively

How to intervene without confusing the agent

Progress signals to watch for

Reviewing and Verifying Outputs

The verification paradox

Knowing When Not to Use an Agent

Habits of Effective Agent Users

Start with a written brief, always

Keep a task library

Treat the first run as a draft

Keep a failure log

Let go of the work the agent does well

Tutorial: Claude Code

Installing Claude Code

Your first session

The CLAUDE.md file

Slash commands

Subagents

Hooks

MCP and the broader plugin ecosystem

Tips for getting useful work done

Tutorial: OpenClaw

Prerequisites

Installation

The onboarding wizard

Connecting Telegram (the easy starting channel)

Adding more channels

What OpenClaw can actually do

Skills and sub-agents

Security: the part you cannot skip

Getting started checklist

Further Reading