observability · lead pillar

Every agent run leaves a trace.

OTel-style spans for every LLM call, tool use, and sub-agent — each carrying its model, token cost, latency, and the exact inputs and outputs. Replay any run span by span, grade it with Evals, and prove it got better this week.

Start free Open Trace →

trace · run_8c4fok · 742ms · $0.0053

agent.plan742ms

tools.search_code86ms

llm.complete_refactor612ms

tools.edit_file78ms

evals.no_regression54ms

agentllmtooleval5 spans · 3,007 tok

Span-level token costStep-by-step replayOTel-compatible ingest

See exactly what your agents did

Each run becomes a waterfall of typed spans — planner, tool calls, model completions, and eval checks — nested in time and color-coded by kind. The timeline draws every span proportionally so you can see what ran in parallel, what blocked, and where the wall-clock time actually went.

trace · run_8c4fok · 742ms · $0.0053

agent.plan742ms

tools.search_code86ms

llm.complete_refactor612ms

tools.edit_file78ms

evals.no_regression54ms

agentllmtooleval5 spans · 3,007 tok

Open any span and read the full IO

Expand a span to inspect its raw inputs, outputs, and attributes, plus the model, token counts in and out, duration, and status. Nothing is summarized away — when an agent does something surprising, the prompt and response that caused it are right there, byte for byte.

knowledge · retrievalpgvector

querypolicy on rate limiting third-party API calls?top 6

0.83docs/adrs/004-rate-limiting.md

Third-party calls are capped at 60 rpm per token; retries use exponential backoff with a 3-attempt budget…

0.71rfc/auth-refresh.md

Refresh tokens rotate on every use; the prior token stays valid for a 30s grace window to absorb races…

0.58runbooks/oncall.md

On a 429 storm, shed load at the proxy before paging — the budget breach webhook fires automatically…

3 chunks → # Relevant team contextcited by source

Cost and latency on every call

Token usage and dollar cost are recorded per span and roll up to the run total, priced per model. The KPI row breaks out duration, total cost, span count, and tokens in and out at a glance — so the one expensive completion hiding in a cheap-looking run has nowhere to hide.

cost · this month84% of budget

$842 / $1,000

near cap

84% spentsoft @ $750

by feature

canvas$184

evals$93

agent runner$28

adversary$7

last 30 days$842 total

Grade it, then prove it improved

Run your Evals against any trace to get pass/fail verdicts with scores and reasons inline, and generate a 5-whys postmortem when a run fails or burns budget. Trace is the lead pillar because observability without judgment is just logs — this is how you show your agents are getting better, not just busier.

gate · checkout-agentpassed

containsoutput-contains · ticket-idpass

containsnever-contains · secretspass

judgejudge · tone rubric0.94pass

judgejudge · cites a source0.88pass

gate passed4 / 4

stackon / eval-gate$0.0021 · 1.8s

Bring your own stack

Trace speaks an OTel-compatible schema and ingests over a scoped REST API or the Stackon MCP server. Mint a traces:write token, point your existing agents at the endpoint, and spans start flowing in under a minute — no rewrites, no vendor lock-in.

mcp.json · stackon6 tools · drop-in

{
  "mcpServers": {
    "stackon": {
      "command": "npx",
      "args": ["@stackon/mcp"],
      "env": { "STACKON_API_TOKEN": "ht_•••" }
    }
  }
}