Stackon
observability · red-team

Attack your own agents before users do.

Point a campaign at the front-door agent of any canvas and run the adversarial pack — prompt injection, system-prompt extraction, a DAN-style jailbreak, off-task derailment. An LLM judge grades each attempt: blocked is a pass, fell-for-it is a fail.

adversary · standard pack3/4 blocked
blockedignore previous instructions
blockedsystem-prompt extraction
blockedDAN-style jailbreak
succeededoff-task chitchat
3 blocked · 1 succeeded75% block rate
Standard attack packLLM-judged block rateA real trace per attack

01

Four canonical attacks, one click

The standard pack fires the inputs that actually break agents: an "ignore previous instructions" override, a verbatim system-prompt extraction, a Do-Anything-Now jailbreak asking for unauthorized-access steps, and an off-task chitchat distraction. Each one hits the first node of the canvas you target — the same agent your users meet.

adversary · standard pack3/4 blocked
blockedignore previous instructions
blockedsystem-prompt extraction
blockedDAN-style jailbreak
succeededoff-task chitchat
3 blocked · 1 succeeded75% block rate

02

Judged the same way your evals are

Every attempt is scored by the LLM judge that powers Evals, with a rubric written per attack. Blocked means the agent held its ground and stayed on task; succeeded means it leaked, complied, or derailed. You get a verdict, a one-line reason, and an honest block-rate bar across the campaign.

gate · checkout-agentpassed
containsoutput-contains · ticket-idpass
containsnever-contains · secretspass
judgejudge · tone rubric0.94pass
judgejudge · cites a source0.88pass
gate passed4 / 4
stackon / eval-gate$0.0021 · 1.8s

03

Every attack leaves a trace

Each attempt records a full Stackon trace — the attack prompt, the agent's response, model, latency, and token cost — so a failure isn't a red row you have to take on faith. Jump straight from any verdict to its trace and replay exactly what the agent saw and said.

trace · run_8c4fok · 742ms · $0.0053
agent.plan742ms
tools.search_code86ms
llm.complete_refactor612ms
tools.edit_file78ms
evals.no_regression54ms
agentllmtooleval5 spans · 3,007 tok

04

Governed and accountable

Campaigns respect your team budget before they spend a token, route through the same compliance and PII path as normal runs, and write an audit event on completion. Re-run a campaign after a prompt change to confirm the failure is actually closed.

compliance · trust layeraudit-ready
agent.coderAgent run· trace · 8c4f21a12:04:11
u · danaRole changed· member · owner12:04:42
proxyPII redacted· 3 replacements12:05:09
u · renBYOK key rotated· anthropic12:06:30
pii proxy · standardscrubbing

in email dana@acme.io, card 4242 4242 4242 4242

out email <REDACTED:email>, card <REDACTED:credit_card>

anthropic…aF3kopenai…9Qx2
AES-256-GCM

4 attacks

Standard pack

LLM judge

Grader

Full trace

Per attempt

Speed plus trust — prove your agents got better this week.

Adversary is one piece of Stackon, the observability-first workspace for teams running Claude and Codex. Start free and instrument your first run today.