Attack your own agents before users do.
Point a campaign at the front-door agent of any canvas and run the adversarial pack — prompt injection, system-prompt extraction, a DAN-style jailbreak, off-task derailment. An LLM judge grades each attempt: blocked is a pass, fell-for-it is a fail.
01
Four canonical attacks, one click
The standard pack fires the inputs that actually break agents: an "ignore previous instructions" override, a verbatim system-prompt extraction, a Do-Anything-Now jailbreak asking for unauthorized-access steps, and an off-task chitchat distraction. Each one hits the first node of the canvas you target — the same agent your users meet.
02
Judged the same way your evals are
Every attempt is scored by the LLM judge that powers Evals, with a rubric written per attack. Blocked means the agent held its ground and stayed on task; succeeded means it leaked, complied, or derailed. You get a verdict, a one-line reason, and an honest block-rate bar across the campaign.
03
Every attack leaves a trace
Each attempt records a full Stackon trace — the attack prompt, the agent's response, model, latency, and token cost — so a failure isn't a red row you have to take on faith. Jump straight from any verdict to its trace and replay exactly what the agent saw and said.
04
Governed and accountable
Campaigns respect your team budget before they spend a token, route through the same compliance and PII path as normal runs, and write an audit event on completion. Re-run a campaign after a prompt change to confirm the failure is actually closed.
in email dana@acme.io, card 4242 4242 4242 4242
out email <REDACTED:email>, card <REDACTED:credit_card>
4 attacks
Standard pack
LLM judge
Grader
Full trace
Per attempt
Part of one platform
Adversary works hand in hand with Observability.
Speed plus trust — prove your agents got better this week.
Adversary is one piece of Stackon, the observability-first workspace for teams running Claude and Codex. Start free and instrument your first run today.