Using AI Agents: Getting Started, becoming a capable agent user before you become a builder.
The most productive people with AI agents are not the ones who built them — they're the ones who know exactly how to brief them, where to trust them, and when to put them down and do the work themselves.
Who this chapter is for
This chapter is written for practitioners who want to get useful work done with agent-based AI products — today, with what's available now — rather than build agent systems from scratch. The next chapter covers building. This one covers using.
The skills here transfer. Whether you're directing Cursor through a refactoring job, handing a research task to a Claude Project, or watching an operator-style agent execute a multi-step workflow, the underlying principles — how to scope a task, how to brief an agent, how to verify what comes back — are the same.
What's Available Today
Agent products have proliferated rapidly. The landscape is still forming, but it has already sorted itself into recognisable categories by the type of work each product targets. Knowing what category a tool belongs to helps you anticipate where it will excel and where it will struggle.
IDE-integrated agents that can read your codebase, write and edit files, run tests, and iterate on failures. Cursor's Composer mode lets you describe a feature and watch it implement across multiple files. Best at bounded, testable tasks inside an existing project.
Strength: code navigation + test feedback loop. Watch: may touch more files than you intended.
Persistent workspaces that maintain context across sessions, search the web, and synthesise long documents. Claude Projects lets you upload reference material and carry on multi-session research threads. Perplexity specialises in web retrieval with citation trails.
Strength: synthesis across long documents. Watch: citations require independent verification.
Agents that control a real browser or desktop to complete tasks humans would perform manually: filling forms, navigating web apps, extracting structured data from interfaces that have no API. Dramatically useful for repetitive web workflows; reliability on novel sites varies.
Strength: anything with a UI but no API. Watch: review before irreversible actions.
Full-loop software engineering agents designed to take a GitHub issue or feature request and produce a pull request. They read codebases, plan changes, implement, run tests, and iterate. Best used for well-specified, isolated tasks; still struggle with large cross-cutting refactors.
Strength: end-to-end coding loops. Watch: requires well-specified issues and test coverage to validate.
Domain-specific agents embedded in productivity and CRM tools. They operate within a constrained, known environment — your files, your CRM records, your project management data — which makes them more reliable than general-purpose agents. Trade-off: limited to their platform.
Strength: deeply integrated with your data. Watch: can't easily orchestrate across platforms.
Configurable agent systems you compose from parts: tool sets, memory stores, sub-agent networks. More flexible than any specific product but require more setup. The line between "using" and "building" here is blurry — these live in Chapter 12.
Strength: custom workflows. Watch: you own the failure modes too.
A practical rule of thumb: start with the most domain-specific tool that covers your task. A coding agent will outperform a general-purpose research agent on code tasks, even if the general-purpose one can technically write code too. Specialised context — knowing your codebase, your CRM, your document library — matters more than raw model capability for most real-world tasks.
Prompting Agents vs. Prompting Chat
The prompting intuitions built up from using chat AI — be conversational, refine iteratively, give context as the conversation develops — transfer poorly to agent use. Agents behave more like contractors than like conversation partners, and briefing them accordingly makes a significant difference to outcomes.
The most important shift is from conversational to briefing mode. When you brief an agent, you are writing a specification that the agent will interpret and execute largely without you. The quality of that specification determines the quality of the work.
Think of an agent like a skilled contractor you've hired for a day. They're competent, but they can't read your mind. If you say "fix the kitchen," they'll make reasonable choices about what "fixed" means — choices you may not agree with. If you say "regrout the tile around the sink, use white grout, leave the cabinets untouched, and tell me before touching anything near the dishwasher," you get predictable results. The same principle applies to agents. Vague tasking is not the contractor's fault when they interpret it wrong.
The biggest shift: stating what you don't want
Chat models are trained to infer your intent and help helpfully within it. Agents are trained to complete tasks, which means they will make choices when ambiguity arises — and they will often make choices you didn't intend. The single most underused prompting technique for agents is explicitly stating constraints: what the agent should not do, which files it should not touch, which APIs it should not call, which decisions it should stop and ask about rather than resolve autonomously.
Setting Scope and Success Criteria
The most reliable predictor of a good agent run is the quality of the scope definition you bring to it. Scope work happens before the agent starts — it's the thinking you do to convert a fuzzy goal into a concrete, bounded task with a clear definition of done.
Scope has two components: what the agent should accomplish (the positive specification) and what boundaries it should not cross (the constraint specification). Both are necessary. A task without constraints will drift; constraints without a clear goal produce an agent that asks for clarification at every step.
Scope canvas — fill this out before starting
This canvas takes five to ten minutes and reliably prevents the most common failure modes: agents that do the right task in the wrong scope, agents that finish technically but miss the actual need, and agents that make consequential decisions autonomously that should have been escalated.
Calibrating autonomy to risk
Not all tasks warrant the same level of autonomy. A useful mental model is a two-axis grid: how reversible are the agent's actions (can you undo them?) against how confident are you in your specification (have you seen this kind of task succeed before?). High reversibility and high confidence → give the agent full autonomy. Low reversibility or low confidence → add checkpoints, request interim summaries, or break the task into smaller approved stages.
The Agent Brief
An agent brief is the instruction you give to start a task. It's not a conversation opener — it's a working document. The agent will refer back to it throughout its run whenever it needs to decide between competing interpretations. Writing it clearly is the single highest-leverage thing you can do before pressing start.
The anatomy of an effective brief has five components:
Notice that the brief front-loads context before the task. Agents that understand why a task matters make better judgment calls at decision points. An agent that knows the output is going to external investors will be more conservative about flagging uncertainty than one that thinks it's an internal draft.
Calibrating brief length
Briefs should be as long as they need to be and no longer. For simple, low-stakes tasks (generate five subject line options for this email), a single sentence is fine — the overhead of a full brief exceeds the benefit. For anything that will run autonomously for more than a minute, involves irreversible actions, or will be shared outside the team, a structured brief is worth the five minutes it takes to write. The cost of a vague brief is almost always higher than the cost of writing a clear one.
The most commonly missing element in agent briefs is what the agent should not do. Agents are optimised to complete tasks — they will fill gaps in your specification with reasonable-seeming choices. Tell the agent explicitly what's out of scope. "Do not modify any files outside the src/ directory." "Do not send any external requests." "Do not make purchases." The clearer your negative specification, the fewer unwanted surprises you get.
Working With an Agent Mid-Task
Once you've handed off a task, the instinct is to wait. That's often right — constant interruption defeats the purpose of using an agent at all. But there are moments when monitoring and intervening mid-task is the right call.
When to monitor actively
For any run involving irreversible actions — writing to production databases, sending emails, making purchases, publishing content — watch at least the first few steps. Agents that start down the wrong path tend to compound their initial error with each subsequent step. Catching a misinterpretation at step two is far cheaper than unwinding 40 steps of downstream consequences.
For long-running research or analysis tasks, a mid-task check after the agent has outlined its plan (but before it executes) is often valuable. Many agents will emit a plan or summary of what they intend to do before proceeding. This is the ideal intervention point: the cost of redirecting is zero, and you can confirm the interpretation is right before it spends time executing it.
How to intervene without confusing the agent
When you do need to redirect mid-task, be explicit that you're correcting the course rather than adding to the task. Phrases like "Stop and restart with this clarification:" or "Before continuing, revise your understanding of the goal:" signal to the agent that prior work should be reconsidered, not built upon. Vague corrections ("actually, can you also...") are interpreted as additions rather than replacements, and the agent may try to satisfy both the original and corrected instruction simultaneously.
When a mid-task correction is significant — the agent misunderstood the fundamental goal, not a minor detail — it's usually better to stop, revise the brief, and restart from the beginning than to patch the current run. A patched run carries the cognitive debt of the original misinterpretation forward; the agent is optimising around a corrected version of a flawed understanding. Fresh starts produce cleaner work.
Progress signals to watch for
Good agents emit interpretable signals as they work: tool calls with readable arguments, interim summaries before moving to the next stage, explicit uncertainty flags ("I couldn't find X, proceeding with Y instead"). If an agent has been running for several minutes without any readable progress signal — just cryptic tool calls or long silent stretches — that's often a sign it's stuck in a loop or pursuing a dead end. Interrupt and check rather than waiting for it to recover on its own.
Reviewing and Verifying Outputs
Agent outputs require a different kind of review than human outputs. A human collaborator who is wrong about something usually reveals the uncertainty — hedged language, questions asked, caveats inserted. Agents often state incorrect things with the same confident voice they use for correct things. They will cite sources that don't support the claim they're making, write code that looks plausible but fails on edge cases, and produce summaries that feel complete while silently omitting inconvenient details.
Review discipline is not about distrusting agents — it's about using them appropriately. An agent that generates a strong first draft you then verify and correct is dramatically more productive than an agent you either don't use or trust blindly.
The verification paradox
There is a temptation to reduce the verification burden by only using agents for tasks you can fully verify — which risks making agents pointless, since you could have just done those tasks yourself. The practical resolution is to match verification effort to output stakes, not to output complexity. A lengthy research summary going into an internal brainstorm warrants a lighter touch than a paragraph going into a client proposal. Develop calibrated review habits rather than binary "trust everything" or "check everything" policies.
Knowing When Not to Use an Agent
The question "should I use an agent for this?" is not asked often enough. The novelty of the technology and the genuine productivity gains on well-suited tasks create a tendency to reach for agent tools even when they're not the right fit. Over-automation produces a specific type of failure: confident, polished-looking wrong answers that take more effort to fix than the original task would have taken to do.
Writing a brief, waiting for the agent to run, reviewing the output, and correcting it may easily take longer than just doing the thing. The overhead of agent use has a floor. Below a certain task complexity, it costs you time rather than saving it.
→ Just do it.
If you lack the domain expertise to know whether the agent got it right — or if checking would take as long as doing — then the agent is producing unverifiable work. Unverifiable agent output in a decision-making context is a liability, not an asset.
→ Bring in a human expert, or use the agent only for a piece you can check.
Tasks that require knowing your organisation's unstated culture, reading interpersonal dynamics, or making judgment calls in politically sensitive situations are not well-suited to agents. The agent will produce something technically coherent that misses the actual point.
→ Use the agent for the research or drafting; reserve the judgment call for yourself.
If you find yourself unable to articulate what success looks like, the task is not ready for an agent. The difficulty of writing a clear brief is a diagnostic signal: it reveals that the task itself isn't well-enough defined to be delegated to anyone, human or AI.
→ Spend the time clarifying the task first, then reconsider automation.
Sending an email to the wrong recipient, deleting production data, making a financial commitment — actions that are irreversible and consequential should be executed by humans or require explicit human approval at each step. The efficiency gain is not worth the tail risk.
→ Use staged approval with human confirmation at each irreversible step.
Using an agent to draft a message to a close colleague, a condolence note, or a sensitive negotiation email is technically possible but often undermines the trust and authenticity the relationship depends on. Humans notice when writing sounds like it comes from a different voice.
→ Write it yourself, or use the agent only for structural ideas you then fully rewrite.
None of these rules are absolute. The test is whether the agent is genuinely helping you do better work faster, or whether you're using it because it's available. When in doubt, ask yourself: if the agent's output were completely wrong, how would I know, and what would it cost? If the answer is "I wouldn't know easily" or "it would be very costly," increase your oversight or step back from automation.
Habits of Effective Agent Users
The skill gap between people who get great results from agents and people who get mediocre results is not primarily about technical sophistication. It's about a set of habits that anyone can develop.
Start with a written brief, always
Even for simple tasks, writing down what you want before you prompt the agent forces clarity. The act of writing exposes ambiguity — you realise you haven't defined "recent," or you're not sure whether you want three options or five. This friction is valuable. Agents that are given vague starting conditions produce vague or misaligned results; the time you spend writing a clear brief is time you save on revision and re-runs.
Keep a task library
The tasks you delegate to agents will repeat. Researching competitors. Drafting response emails. Summarising meeting notes. Auditing code for a class of bug. For each repeating task, keep a brief template: the scope, the constraints, the output format, the stop conditions. Reusing a proven brief eliminates the overhead of re-specifying and produces more consistent results than specifying fresh each time. It also surfaces when your process has improved — you update the template and get better results automatically.
Treat the first run as a draft
Resist the temptation to use the first run's output directly. The first run tells you whether your brief was well-specified and where the agent's interpretation diverged from your intent. Use that information to improve the brief, then re-run. The second run, with a tightened specification, will almost always outperform the first. The incremental cost of a second run is low; the quality difference is often significant.
Keep a failure log
When an agent produces a bad result, record what happened: what the task was, what the brief said, what the agent produced, and what went wrong. Review this log periodically. Most failures cluster around a small number of root causes: ambiguous scope definitions, missing negative constraints, tasks where the agent's tool access doesn't match what the task requires, or tasks in domains where the agent's training knowledge is too thin to work reliably. Identifying your personal failure patterns is far more valuable than reading generic advice about agent prompting.
Let go of the work the agent does well
The last habit is psychological: extending genuine trust to tasks where the agent is reliably competent. People who use agents most productively have moved past the stage of reviewing everything equally. They know which tasks their agents handle well enough that a light scan suffices, and which tasks need deep verification. Building that trust, calibrated to actual reliability rather than either naive faith or reflexive suspicion, is the endpoint of good agent-use practice.
Agent productivity compounds. Each well-specified brief becomes a template. Each failure teaches you a constraint to add next time. Each verified output builds confidence about where trust is warranted. After six months of deliberate agent use, the gap between a skilled agent user and a casual one is not the technology — it's the accumulated library of refined briefs, verified patterns, and calibrated trust that the skilled user has built up. Start deliberately.
Tutorial: Claude Code
Claude Code is Anthropic's command-line agent for software work. It is not a chat interface that happens to write code; it is an autonomous tool that lives in your terminal, reads your codebase, edits files, runs shell commands, executes tests, commits to Git, and calls external APIs to accomplish whatever goal you brief it on. You give it a target. It figures out the steps and asks before it touches anything irreversible.
This tutorial gets you from a clean machine to running a useful Claude Code session, then introduces the four customisation surfaces — CLAUDE.md, slash commands, subagents, and hooks — that separate casual use from genuinely productive use.
Installing Claude Code
Claude Code is primarily designed for Unix-like environments. macOS and Linux are first-class targets. On Windows, the recommended path is to run it inside the Windows Subsystem for Linux (WSL); a native Windows build exists but is less battle-tested. The official installer is a one-line shell command:
This drops the claude binary onto your shell's path and configures background auto-updates, so you stay current without thinking about it. Confirm the install with claude --version. The first time you run an interactive session — by typing claude inside any directory — you will be prompted to log in with the Anthropic account that holds your Claude subscription.
Your first session
Once installed, navigate to a project you'd like to work in. Claude Code performs best when launched from the root of a real codebase — a Git repository, an existing application, anywhere with files for it to read. Empty folders are fine for sandboxing but rob the agent of the context it works best with.
You will land at a prompt that looks superficially like a chat interface. The difference is what happens next. When you describe a task — "add input validation to the signup form so emails are checked against a regex and passwords must be at least 12 characters" — Claude Code does not write a code block for you to copy. It reads the relevant files, plans the change, makes the edits, and asks for permission before saving them.
That permission step is central. By default, Claude Code asks before every file write, every shell command that mutates state, and every external API call. You can approve once, approve always for a session, or deny and redirect. Beginners should leave the defaults in place — the friction of approving each step is exactly the visibility you need to learn how the agent thinks.
Claude Code supports several permission modes accessible via flags. --dangerously-skip-permissions turns approvals off and is exactly as risky as it sounds — reserve it for sandboxed environments, never on a machine that holds production credentials. The default mode (with prompts) and an auto-edit mode (auto-approve edits but still prompt for shell commands) cover most real workflows.
The CLAUDE.md file
The single most impactful thing you can do to make Claude Code more useful inside a specific project is to drop a CLAUDE.md file at the root of the repository. Every time you start a session in that folder, Claude Code reads CLAUDE.md first and treats its contents as standing instructions: your tech stack, the commands to run tests, your team's coding conventions, things to avoid touching, and anything else you would tell a competent contractor on day one.
You can ask Claude Code to draft a starting CLAUDE.md for you by running /init in an interactive session. It will scan the repo and propose one. Treat the result as a first draft and edit it — generic project descriptions help less than specific, opinionated guidance. A useful starting structure:
Slash commands
Inside an interactive session, anything starting with a slash is a command rather than a task. A handful are essential to know:
/init— generate a starterCLAUDE.mdbased on the current repository./clear— wipe the current conversation context. Use this between unrelated tasks; long contexts confuse the agent and cost more./agents— open the subagent manager (covered below)./hooks— define deterministic scripts that run at specific lifecycle events./review— request a code review of the pending changes on the current branch./security-review— same idea, but focused on security implications.
You can also write your own slash commands by dropping a markdown file into ~/.claude/commands/ (global) or .claude/commands/ (per-project). The filename becomes the command name; the contents become the prompt template. This is the fastest way to capture a workflow you find yourself repeating — a /standup-summary command that summarises yesterday's commits and today's open PRs, for example, or a /release-notes command that drafts notes from the diff between two tags.
Subagents
Subagents are specialised assistants that handle a specific kind of side task without polluting the main conversation's context. The classic case is searching a large codebase: instead of having the main agent read fifty files looking for something — flooding its context window with content it will then ignore — you delegate the search to a subagent, which works in its own context and returns only its summary.
List available subagents with /agents; create custom ones by adding a markdown file to ~/.claude/agents/. The file's frontmatter declares the agent's name, description, and tool access; the body is the system prompt that defines its role. Common patterns include a code-reviewer subagent that gives independent reads on diffs, a test-runner that focuses purely on running and triaging tests, and an explore agent specialised for codebase navigation.
Hooks
Hooks are deterministic shell scripts that fire at specific points in Claude Code's lifecycle — before a tool call, after a file edit, before a shell command, after a session ends. Where CLAUDE.md instructions are advisory (the agent may forget them as context fills up), hooks are non-negotiable: they execute regardless of whether the model decides to follow them.
Typical uses: auto-format files after edits, block edits to protected paths entirely, log every shell command for audit, run a linter on every change. Configure them via /hooks or by editing ~/.claude/settings.json. For beginners, the highest-value first hook is a PreToolUse hook that blocks shell commands matching a deny-list (anything starting with rm -rf /, anything writing to /etc, anything that touches your password manager).
MCP and the broader plugin ecosystem
Claude Code speaks the Model Context Protocol (MCP), an open standard for giving language models access to external tools and data. Add an MCP server — for your database, your project tracker, your design files — and Claude Code gains the ability to query that system as part of its reasoning. The claude mcp subcommand manages installations.
Plugins go further: they are installable bundles of skills, slash commands, hooks, and MCP servers grouped together. Installing one drops a coherent set of capabilities into your environment in a single step. The plugin ecosystem is young but growing fast; community marketplaces have started to emerge in 2026.
Tips for getting useful work done
A few habits that separate productive Claude Code use from frustrating use, distilled from the early adopter community:
- Start small. Your first sessions should target tightly-bounded tasks: add this validation, fix this failing test, refactor this one function. Cross-cutting refactors and ambiguous "make this better" prompts produce inconsistent results until you have calibrated to how the tool thinks.
- Use
/clearaggressively. Stale context is the biggest single cause of agent confusion. When you finish a task, clear before starting the next. - Read what it's doing, especially early. Claude Code shows every tool call and file edit. Skim them. The intuition you build about how it interprets briefs is worth more than any list of best practices.
- Commit often. The cheapest safety net is Git. Commit before any large change so a single
git resetcan undo a run that went sideways. - Treat
CLAUDE.mdas a living document. Every time the agent does something dumb, ask whether a sentence inCLAUDE.mdwould have prevented it. If yes, add it. After a few weeks, the file becomes a compressed record of your project's tribal knowledge.
Cursor, Windsurf, and similar IDE-integrated agents share many of Claude Code's underlying ideas — agentic tool use, project context files, slash commands. The skills transfer. If you have already invested in one, you can think of Claude Code as the same model accessed through a different interface (the terminal) and tuned for slightly different workflows. Many practitioners use both, with the IDE for exploratory editing and Claude Code for batch jobs that run in the background.
Tutorial: OpenClaw
OpenClaw is a different shape of agent product from Claude Code. It is not a coding tool; it is a general-purpose personal agent that you run locally and access through the messaging apps you already use — Telegram, Discord, WhatsApp, Signal. You ask it things via chat from your phone or desktop, and a daemon running on your machine carries out the work: reading and writing files, running commands, browsing websites, controlling APIs, sending emails.
Released in late 2025 by Austrian developer Peter Steinberger (originally as Clawdbot, briefly as Moltbot, and finally as OpenClaw after a January 2026 rebrand following trademark complaints), the project crossed 100,000 GitHub stars within its first week and went on to become the fastest-growing open-source project on GitHub by stars-per-day. A non-profit foundation now provides stewardship after Steinberger joined OpenAI in February 2026.
This tutorial gets you from zero to a running agent reachable from your phone in roughly fifteen minutes. The most important caveat is up front: OpenClaw asks for very broad permissions on the host machine by design. Treat it like the powerful tool it is.
Prerequisites
OpenClaw runs on macOS, Linux, and Windows. The hard requirement is Node.js version 22 or higher — older versions will not work. Confirm your version before installing:
You will also need an API key for at least one large language model provider. OpenClaw integrates with Claude (Anthropic), GPT (OpenAI), and DeepSeek out of the box, and supports local models via Ollama. The onboarding wizard will ask you to paste a key, so have one ready before you start.
Installation
OpenClaw installs as a global npm package:
If you would prefer not to install Node.js or run a local daemon at all, several cloud providers offer one-click templates that install and host OpenClaw for you — DigitalOcean and AWS Lightsail are the most polished as of mid-2026. The local install is the canonical experience and what the rest of this tutorial covers.
The onboarding wizard
OpenClaw ships with a guided setup flow that handles authentication, model providers, gateway configuration, and a first messaging channel. Run it with:
The wizard walks through the following steps in order:
- Workspace selection. Pick a directory the agent is allowed to read and write within. Choose a dedicated folder; do not point this at your home directory.
- Model provider. Pick a provider (Claude, OpenAI, DeepSeek, or a local Ollama endpoint) and paste your API key. The key is stored under
~/.openclaw/and never sent anywhere except the provider you chose. - Gateway configuration. The Gateway is the local daemon that routes messages between your chosen channel and the agent. Defaults are fine for first-time users.
- First channel. The wizard suggests Telegram, which is the easiest to set up. Accept the default.
After the wizard finishes you will have a running gateway, an authenticated provider, and instructions for the next step — wiring up your messaging channel.
Connecting Telegram (the easy starting channel)
Telegram is the recommended first channel for several reasons: its Bot API is robust, it does not require a public IP or domain (OpenClaw uses long-polling by default), and the registration flow takes about two minutes. To create a bot:
- Open Telegram and start a chat with
@BotFather. - Send
/newbotand follow the prompts to give your bot a name and username. - BotFather replies with an HTTP API token. Copy it.
Paste that token when the OpenClaw onboarding wizard asks for it. If you set up the channel later instead, store the token in your channel config file, then start the gateway:
Open the Telegram chat with your new bot and send "hello." The bot's reply confirms the round trip is working: phone → Telegram → BotFather servers → your local OpenClaw daemon → your model provider and back.
Adding more channels
Once Telegram is working, additional channels can be added with:
Each platform stores its config as a YAML file under ~/.openclaw/channels/. Discord and WhatsApp are well-supported. Signal works but requires registering a phone number through Signal's verification process and managing cryptographic state — count on it being more finicky than the others. Add ~/.openclaw/channels/*.yaml to your .gitignore if the directory is ever tracked; tokens should never be committed.
What OpenClaw can actually do
Once connected, OpenClaw can do anything your machine can do, mediated through the LLM you wired up. In practice, users delegate things like:
- File and document work. "Find the most recent invoice in my Downloads folder and rename it to
2026-04-acme-invoice.pdf." OpenClaw reads, writes, and moves files within the workspace folder you authorised. - Shell commands. "Run the test suite and tell me which ones failed." It executes commands and returns their output, with permission prompts on the first run of each new command class.
- Coding sub-sessions. One of the most-cited features is OpenClaw's ability to manage Claude Code or Codex sessions on your behalf — you ask it to fix a bug, and it spawns a coding agent, supervises the run, captures errors via Sentry, and opens a GitHub pull request when the test suite passes.
- Web tasks. Browsing pages, scraping structured data, calling REST APIs.
- Communication. Sending emails, replying to messages on connected channels, managing calendar events.
Skills and sub-agents
OpenClaw extends through skills: bundles of tool definitions and prompt templates that teach the agent a domain. There is a community marketplace of skills for everything from invoice processing to Strava analysis to home-automation control. Installing one is a single command:
The agent can also spawn sub-agents for complex multi-step jobs, allowing you to delegate something like "go through every email in this label, extract the receipts, and put them in a spreadsheet" without flooding the main context window.
Security: the part you cannot skip
OpenClaw's broad permissions are its main appeal and its main risk. The same agent that can rename a file can — if misconfigured or if its messaging channel is compromised — delete every file in the workspace, exfiltrate the contents to a third party, or run arbitrary commands. Several practical safeguards apply:
- Use a dedicated workspace folder. Never point the agent at your home directory. The workspace is the boundary of what it can read and write.
- Keep tokens out of repos. The recommended pattern is to inject channel tokens via environment variables, with
~/.openclaw/channels/*.yamlexcluded from any version control. - Rotate tokens regularly. A 90-day rotation cadence is standard practice in production setups.
- Consider NemoClaw if your deployment matters. NVIDIA released NemoClaw in March 2026 as a security add-on for OpenClaw deployments, providing OpenShell sandboxing and policy enforcement. It is overkill for a hobbyist setup but the right call for any agent touching sensitive systems.
- Watch the daemon logs.
openclaw daemon logsshows every action the agent took. Check it periodically — both for security and to learn how the tool interprets your instructions.
Cybersecurity researchers have repeatedly pointed out that OpenClaw's permission model is structurally generous: the agent can access email, calendars, messaging, and files because that is what makes it useful. A misconfigured public deployment can leak everything that flows through it. For a first-time install on a personal laptop, the practical floor is — dedicated workspace folder, no production credentials on the host, channel tokens stored in environment variables, and NemoClaw or equivalent sandboxing if the agent will ever touch anything that matters.
Getting started checklist
If this is your first agent install, work through the items below in order and stop at the first one that does not succeed. Each builds on the last:
- Confirm Node.js 22+ is installed.
- Install OpenClaw globally.
- Run the onboarding wizard, pointing it at a fresh empty folder as the workspace.
- Wire up Telegram via BotFather; confirm a "hello" round-trips.
- Ask the agent to create a single text file in the workspace and read it back; verify the daemon log shows the file write.
- Only then add additional channels, install community skills, or grant access to anything sensitive.
That sequence — start tiny, watch closely, expand the surface area only after the previous step works — is the same advice that applies to every agent product, but it matters more here because the host machine is the agent's playground.
Further Reading
-
Anthropic's Claude Usage DocumentationOfficial documentation covering Claude's agentic features, Projects, tool use, and best practices for directing Claude as an agent. Updated continuously as the product evolves. The most current source for Claude-specific agent use patterns.
-
Prompt Engineering for DevelopersAnthropic's official prompt engineering guide, covering techniques for specifying tasks, adding context, using examples, and structuring instructions for agentic use. The clearest single reference for effective Claude prompting across both chat and agentic contexts.
-
Devin: Cognition's AI Software EngineerThe product introduction and case studies for Devin, the first commercially available full-loop software engineering agent. Reading how Cognition frames effective use reveals what this class of tool is designed for and what remains outside its reliable range. Essential for understanding what "software engineering agent" means in practice.
-
SWE-bench Results & Practitioner NotesThe SWE-bench leaderboard with agent submission descriptions. Reading the methodology notes from top-ranked submissions reveals the prompting and scaffolding techniques that practitioners have found to work — and the failure modes they had to engineer around. Practitioner knowledge about coding agent prompting, written by people who pushed the frontier.
-
Getting the Most Out of CursorCursor's own guidance on using its agentic features effectively, including how to scope tasks for Composer, when to use chat vs. Composer vs. inline edit, and how to review agent-generated code changes. The most practical guide to the most widely used coding agent.
-
Claude Code DocumentationOfficial documentation covering installation, the
CLAUDE.mdfile, slash commands, subagents, hooks, and MCP integration. The quickstart and the subagent guide are the two pages worth bookmarking on day one. The single source of truth as Claude Code evolves; updated continuously. -
awesome-claude-codeA curated list of skills, hooks, slash commands, subagent orchestrators, applications, and plugins for Claude Code, maintained by the community. Browsing the repo is the fastest way to see how power users have configured the tool and to lift patterns that solve your own problems. The clearest map of the Claude Code ecosystem.
-
OpenClaw DocumentationOfficial OpenClaw documentation covering installation, the onboard wizard, channel configuration for Telegram, Discord, WhatsApp, and Signal, plus the gateway and skills systems. The Telegram setup page is the most polished and a sensible first read after this chapter. The canonical reference for an evolving open-source project.
-
OpenClaw on GitHubThe source repository, including the README, releases, and issue tracker. Browsing recent issues is a fast way to see what's currently breaking, what edge cases real users are hitting, and where the project is heading. Useful for ground-truth on a fast-moving project that is changing faster than any guide can keep up with.
-
Build a More Secure, Always-On Local AI Agent with OpenClaw and NVIDIA NemoClawNVIDIA's official guide to running OpenClaw with the NemoClaw security add-on, including OpenShell sandboxing, policy enforcement, and the threat model the combination is designed to address. Essential reading before granting an OpenClaw instance any meaningful permissions. The clearest single source on the security envelope around personal AI agents in 2026.