Stackon
observability · lead pillar

Every agent run leaves a trace.

OTel-style spans for every LLM call, tool use, and sub-agent — each carrying its model, token cost, latency, and the exact inputs and outputs. Replay any run span by span, grade it with Evals, and prove it got better this week.

trace · run_8c4fok · 742ms · $0.0053
agent.plan742ms
tools.search_code86ms
llm.complete_refactor612ms
tools.edit_file78ms
evals.no_regression54ms
agentllmtooleval5 spans · 3,007 tok
Span-level token costStep-by-step replayOTel-compatible ingest

01

See exactly what your agents did

Each run becomes a waterfall of typed spans — planner, tool calls, model completions, and eval checks — nested in time and color-coded by kind. The timeline draws every span proportionally so you can see what ran in parallel, what blocked, and where the wall-clock time actually went.

trace · run_8c4fok · 742ms · $0.0053
agent.plan742ms
tools.search_code86ms
llm.complete_refactor612ms
tools.edit_file78ms
evals.no_regression54ms
agentllmtooleval5 spans · 3,007 tok

02

Open any span and read the full IO

Expand a span to inspect its raw inputs, outputs, and attributes, plus the model, token counts in and out, duration, and status. Nothing is summarized away — when an agent does something surprising, the prompt and response that caused it are right there, byte for byte.

knowledge · retrievalpgvector
querypolicy on rate limiting third-party API calls?top 6
0.83docs/adrs/004-rate-limiting.md

Third-party calls are capped at 60 rpm per token; retries use exponential backoff with a 3-attempt budget…

0.71rfc/auth-refresh.md

Refresh tokens rotate on every use; the prior token stays valid for a 30s grace window to absorb races…

0.58runbooks/oncall.md

On a 429 storm, shed load at the proxy before paging — the budget breach webhook fires automatically…

3 chunks → # Relevant team contextcited by source

03

Cost and latency on every call

Token usage and dollar cost are recorded per span and roll up to the run total, priced per model. The KPI row breaks out duration, total cost, span count, and tokens in and out at a glance — so the one expensive completion hiding in a cheap-looking run has nowhere to hide.

cost · this month84% of budget

$842 / $1,000

near cap
84% spentsoft @ $750

by feature

canvas$184
evals$93
agent runner$28
adversary$7
last 30 days$842 total

04

Grade it, then prove it improved

Run your Evals against any trace to get pass/fail verdicts with scores and reasons inline, and generate a 5-whys postmortem when a run fails or burns budget. Trace is the lead pillar because observability without judgment is just logs — this is how you show your agents are getting better, not just busier.

gate · checkout-agentpassed
containsoutput-contains · ticket-idpass
containsnever-contains · secretspass
judgejudge · tone rubric0.94pass
judgejudge · cites a source0.88pass
gate passed4 / 4
stackon / eval-gate$0.0021 · 1.8s

05

Bring your own stack

Trace speaks an OTel-compatible schema and ingests over a scoped REST API or the Stackon MCP server. Mint a traces:write token, point your existing agents at the endpoint, and spans start flowing in under a minute — no rewrites, no vendor lock-in.

mcp.json · stackon6 tools · drop-in
{
  "mcpServers": {
    "stackon": {
      "command": "npx",
      "args": ["@stackon/mcp"],
      "env": { "STACKON_API_TOKEN": "ht_•••" }
    }
  }
}
record_runtrace
start_tracetrace
add_spanspan
end_tracetrace
list_recent_tracesread
run_evalseval
speaks toClaude CodeCursorWindsurfCodexZed

agent · llm · tool · eval · internal

Span kinds

OTel-compatible

Schema

REST + MCP

Ingest

Speed plus trust — prove your agents got better this week.

Trace is one piece of Stackon, the observability-first workspace for teams running Claude and Codex. Start free and instrument your first run today.