E2E Testing with Claude Code
A simple guide to setting up a testing skill to test applications using AI agents
Background
We all know that moment just before release: the feature is actually finished – now someone just needs to test it thoroughly.
In practice, testing often feels like a rather unfair version of the Pareto principle: the first 80% of a feature goes quickly – the last 20% suddenly consist of clicking through the browser, checking edge cases, and hunting bugs.
A dedicated QA team? Outside of enterprise environments, that’s often just wishful thinking. Especially smaller teams or start-ups simply have neither the budget nor the people for it. I mostly work in smaller teams or start-ups – and that is exactly where I encounter this problem constantly.
As I was scrolling through the LinkedIn "AI replaces all developers" apocalypse again one evening, the thought finally occurred to me to simply replace my non-existent testing colleague with AI.
So I decided to simply build my imaginary QA colleague – or rather, have Claude Code play that role.
Concept
The idea: Claude should behave like a real tester. Open the browser, click on things, try out flows, and ideally find errors that I wouldn't even notice myself anymore. So it's about testing the application as a whole (End-to-End).
For this, I ultimately need only three things:
- Claude Code (requires a paid subscription with Anthropic)
- Playwright CLI, so that Claude Code can simulate a browser from the terminal
- A Claude Code Skill that triggers the testing of specific "Test Suites" via a sub-agent and understands their specifications
The Testing Flow
Installation
Node and NPM must be installed. The latter is installed together with Node.
Claude Code
Claude Code is Claude's CLI tool. You can enter prompts, start agents, etc. via the terminal. The installation steps for each platform can be found on the Anthropic website.
Note: A paid subscription model at Anthropic is required to use Claude Code.
Playwright CLI
For the browser part, I use Playwright – Microsoft's tool for automated testing of websites and apps. It offers a CLI that Claude Code can use directly.
The CLI can be installed using the following commands:
npm install -g playwright
npx playwright install
I'm deliberately using the Playwright CLI here instead of MCP – simply because, in my experience, token consumption is significantly lower.
Finally, Claude Code needs the appropriate skill to use the Playwright CLI.
npx skills add https://github.com/microsoft/playwright-cli --skill playwright-cli
Skills
A skill encapsulates recurring instructions and gives Claude Code the ability to use them explicitly or as needed.
I have developed a testing skill to be able to execute my "Test Suites" on demand. This is ultimately possible with the command /test <test-name>.
The skill can be created directly by Claude Code (provide the appropriate prompt) or created manually.
Skills are stored in the directory /.claude/skills/<skill-name>.md. This markdown file contains the instructions that are passed to Claude Code as part of the skill.
Sub-Agents
Sub-agents are specialized assistants that fill specific task areas. A Claude Code session can create sub-agents on instruction or as needed to solve these specific tasks (e.g., testing, bug reporting).
The actual testing is performed for us by a specialized testing sub-agent.
Claude Code can create this, or it can be created manually using the /agents command.
Test Skill & Agent
Skill
In skills/test/SKILL.md.
---
name: test
description: Does real application testing. Reads a natural-language spec, interacts with the applications via the Playwright CLI, produces a findings file, waits for the user to triage it, then optionally hands off to Asana. Use when the user wants to test a flow in one of the apps.
argument-hint: <test-name>
disable-model-invocation: false
---
# Testing workflow orchestrator
Drive the full loop: test → triage → Asana.
## Step 1: Resolve the spec path
`$ARGUMENTS` is a test name (e.g. `customer-creation`): resolve as `claude/tests/$ARGUMENTS/spec.md`
The resolved test name is used throughout the rest of the flow.
## Step 2: Run the tests
Invoke the `testing-agent` subagent, passing only the resolved spec path and test name.
**Do not pre-write or include test scripts in the prompt** — the subagent derives and runs its own
Playwright CLI scripts per requirement. The subagent will:
- Read the spec and credentials independently
- Write and execute small Playwright CLI scripts via Bash to interactively verify each requirement
- Take screenshots only when a bug or suspicious behaviour is found, saved to the findings folder
- Write findings to `claude/tests/<test-name>/findings/<today>.md` using the bug-report skill format
If the subagent reports a setup error (staging unreachable, login broken),
stop and relay the error to the user — do not continue to triage.
## Final report
Summarize the run for the user and highlight the findings
Agent
In agents/testing-agent.md.
---
name: testing-agent
description: Tests web applications by interacting with them via the playwright CLI and checking provided requirements
tools: Read, Write, Edit, Bash, Glob, Grep
---
You are the testing agent for this project. Your job: understand a natural-language
test spec and interact with the related applications to verify that defined requirements are implemented
and behave as expected.
**Your input**: A spec file at `.claude/tests/<test-name>/spec.md` — labeled requirements
(A1, A2, …) describing how the flow should behave.
use the credentials from `.env` to authenticate: `TEST_USERNAME`, `TEST_PASSWORD`
## Workflow
1. **Read the spec.** Open `.claude/tests/<test-name>/spec.md`. Identify
each requirement by its ID (A1, A2, …). Get an understanding of which applications (Admin, B2B, Web) you will need to access.
2. **Conduct the test using `playwright-cli` via Bash.**
3. **Collect artifacts.**
When you find a bug or suspicious behaviour, run `playwright-cli screenshot claude/tests/<test-name>/findings/<id>.png` before moving on.
4. **Write findings** to
`.claude/tests/<test-name>/findings/<YYYY-MM-DD>.md` — a markdown table with
columns `ID | Requirement | Severity | Summary`.
5. **Report back** with a short summary: how many requirements tested, how many findings, and the findings file path.
In our example, the agent accesses the .env file to retrieve login credentials for our app. I strongly recommend not storing these in the agent itself.
A Test Suite
A test suite combines related tests. For our testing agent, we consolidate all requirements for a specific feature in <test-name>/spec.md and provide Claude with further background information if necessary.
Example: .claude/tests/user-registration/spec.md
# User Registration
## Scope
End-to-end user registration in our staging environment
* URL: https://staging.i-am-a-dummy.com
## Requirements
- **A1** Login with valid Credentials: Use your dummy credentials to log in, confirm that it works
- **A2** Register with existing email address: When trying to register a new user with an existing email address (like the one you log in with) the message "Email already exists" should be shown
## Notes for the agent
- use your built-in dummy credentials to conduct this test
Final Folder Structure
.
├── .claude/
│ ├── agents/
│ │ └── testing-agent.md
│ ├── skills/
│ │ └── test/
│ │ └── SKILL.md
│ └── tests/
│ └── user-registration/
│ ├── spec.md
│ └── findings/
│ └── 2026-05-26.md
├── .env
└── ...
Running a Test Suite
Once everything is set up, a test suite can be executed with the following command within a Claude Code session:
/test user-registration
Afterward:
- Claude activates the
testskill atskills/test/SKILL.md - This skill reads
.claude/tests/user-registration/spec.md - And creates a new test agent that tests our target app against the formulated requirements via Playwright CLI.
A result in findings.md could look like this:
| ID | Requirement | Severity | Summary |
|----|-------------|----------|---------|
| A1 | Login with valid credentials | PASSED | User can log in successfully. |
| A2 | Register with existing email | BUG | System shows a generic 500 error instead of "Email already exists". |
Outlook
Does it replace human testing? Of course not. Does it save me hours of mindless clicking and find a surprising number of bugs? Surprisingly often, yes.
The testing skill shows how AI can simulate human testing and take E2E tests to a new level.
Especially with a model like Opus, grandiose results can be achieved. With large context windows, Claude Code not only tests the requirements but also provides deeper feedback in some areas, for example, about the usefulness of input fields, buttons, or formulations. As a rule, the browser console is also checked and attention is drawn to error messages (which human testers usually do not do).
On the other hand, the whole thing is of course very token-intensive (and therefore expensive). Whenever a third-party tool is required, significantly more tokens are consumed than for simple queries via prompt. By avoiding the use of an MCP integration, we save a lot of tokens here (feel free to try the Playwright MCP and compare the consumption), but this testing solution is probably more suitable for enterprise plans.
A remedy could be the simultaneous generation of Playwright tests after interactive testing, but this in turn undermines the purpose of the test agent.
Hi, I'm Nils!
