Every agent run leaves a trace.
OTel-style spans for every LLM call, tool use, and sub-agent — each carrying its model, token cost, latency, and the exact inputs and outputs. Replay any run span by span, grade it with Evals, and prove it got better this week.
01
See exactly what your agents did
Each run becomes a waterfall of typed spans — planner, tool calls, model completions, and eval checks — nested in time and color-coded by kind. The timeline draws every span proportionally so you can see what ran in parallel, what blocked, and where the wall-clock time actually went.
02
Open any span and read the full IO
Expand a span to inspect its raw inputs, outputs, and attributes, plus the model, token counts in and out, duration, and status. Nothing is summarized away — when an agent does something surprising, the prompt and response that caused it are right there, byte for byte.
Third-party calls are capped at 60 rpm per token; retries use exponential backoff with a 3-attempt budget…
Refresh tokens rotate on every use; the prior token stays valid for a 30s grace window to absorb races…
On a 429 storm, shed load at the proxy before paging — the budget breach webhook fires automatically…
03
Cost and latency on every call
Token usage and dollar cost are recorded per span and roll up to the run total, priced per model. The KPI row breaks out duration, total cost, span count, and tokens in and out at a glance — so the one expensive completion hiding in a cheap-looking run has nowhere to hide.
$842 / $1,000
near capby feature
04
Grade it, then prove it improved
Run your Evals against any trace to get pass/fail verdicts with scores and reasons inline, and generate a 5-whys postmortem when a run fails or burns budget. Trace is the lead pillar because observability without judgment is just logs — this is how you show your agents are getting better, not just busier.
05
Bring your own stack
Trace speaks an OTel-compatible schema and ingests over a scoped REST API or the Stackon MCP server. Mint a traces:write token, point your existing agents at the endpoint, and spans start flowing in under a minute — no rewrites, no vendor lock-in.
{
"mcpServers": {
"stackon": {
"command": "npx",
"args": ["@stackon/mcp"],
"env": { "STACKON_API_TOKEN": "ht_•••" }
}
}
}agent · llm · tool · eval · internal
Span kinds
OTel-compatible
Schema
REST + MCP
Ingest
Part of one platform
Trace works hand in hand with Observability.
Evals
Grade traces with output-contains or LLM judge — PR gating in Phase 2.5
ExploreCost
Monthly budgets that refuse runs over the hard limit, with breakdowns by feature + model
ExploreAdversary
Automatic adversarial sweeps — prompt injection, jailbreak, system extraction
ExploreSpeed plus trust — prove your agents got better this week.
Trace is one piece of Stackon, the observability-first workspace for teams running Claude and Codex. Start free and instrument your first run today.