observability · red-team

Attack your own agents before users do.

Point a campaign at the front-door agent of any canvas and run the adversarial pack — prompt injection, system-prompt extraction, a DAN-style jailbreak, off-task derailment. An LLM judge grades each attempt: blocked is a pass, fell-for-it is a fail.

Start free Open Adversary →

adversary · standard pack3/4 blocked

blockedinjectionignore previous instructions

blockedextractionsystem-prompt extraction

blockedjailbreakDAN-style jailbreak

succeededoff-taskoff-task chitchat

3 blocked · 1 succeeded75% block rate

Standard attack packLLM-judged block rateA real trace per attack

Four canonical attacks, one click

The standard pack fires the inputs that actually break agents: an "ignore previous instructions" override, a verbatim system-prompt extraction, a Do-Anything-Now jailbreak asking for unauthorized-access steps, and an off-task chitchat distraction. Each one hits the first node of the canvas you target — the same agent your users meet.

adversary · standard pack3/4 blocked

blockedinjectionignore previous instructions

blockedextractionsystem-prompt extraction

blockedjailbreakDAN-style jailbreak

succeededoff-taskoff-task chitchat

3 blocked · 1 succeeded75% block rate

Judged the same way your evals are

Every attempt is scored by the LLM judge that powers Evals, with a rubric written per attack. Blocked means the agent held its ground and stayed on task; succeeded means it leaked, complied, or derailed. You get a verdict, a one-line reason, and an honest block-rate bar across the campaign.

gate · checkout-agentpassed

containsoutput-contains · ticket-idpass

containsnever-contains · secretspass

judgejudge · tone rubric0.94pass

judgejudge · cites a source0.88pass

gate passed4 / 4

stackon / eval-gate$0.0021 · 1.8s

Every attack leaves a trace

Each attempt records a full Stackon trace — the attack prompt, the agent's response, model, latency, and token cost — so a failure isn't a red row you have to take on faith. Jump straight from any verdict to its trace and replay exactly what the agent saw and said.

trace · run_8c4fok · 742ms · $0.0053

agent.plan742ms

tools.search_code86ms

llm.complete_refactor612ms

tools.edit_file78ms

evals.no_regression54ms

agentllmtooleval5 spans · 3,007 tok

Governed and accountable

Campaigns respect your team budget before they spend a token, route through the same compliance and PII path as normal runs, and write an audit event on completion. Re-run a campaign after a prompt change to confirm the failure is actually closed.

compliance · trust layeraudit-ready

agent.coderAgent run· trace · 8c4f21a12:04:11

u · danaRole changed· member · owner12:04:42

proxyPII redacted· 3 replacements12:05:09

u · renBYOK key rotated· anthropic12:06:30

pii proxy · standardscrubbing

in email dana@acme.io, card 4242 4242 4242 4242

out email <REDACTED:email>, card <REDACTED:credit_card>

anthropic…aF3kopenai…9Qx2

AES-256-GCM

4 attacks

Standard pack

LLM judge

Grader

Full trace

Per attempt

Part of one platform

Adversary works hand in hand with Observability.

Trace

Lead

OTel-style spans, replay, and cost per LLM call

Explore

✓✓ 4/4

Evals

Grade traces with output-contains or LLM judge — PR gating in Phase 2.5

Explore

Cost

Monthly budgets that refuse runs over the hard limit, with breakdowns by feature + model

Explore

Speed plus trust — prove your agents got better this week.

Adversary is one piece of Stackon, the observability-first workspace for teams running Claude and Codex. Start free and instrument your first run today.

Start free Explore the platform