Product · Analytics

Know what your AI engineers actually did.

Running fleets of AI agents without observability is reckless. CodeCourier Analytics ships the dashboard - every run traced, every token costed, every regression caught before it lands in main.

codecourier · analyticslive
Runs · 24h
1,284
+12.4%
Runs today
1,284
+12%
PRs shipped
172
+24
Token spend
$842
−3.1%
Avg run · p95
14d activity
The problem

Agents fail silently. Then your bill explodes.

Silent failures

An agent retries, the checker passes, the PR ships - but the diff is wrong. Without traces you find out from a customer.

Runaway token spend

One bad prompt loop multiplies cost by 40×. Cost-per-PR is the only metric that catches it before the invoice does.

Persona regressions

You tune a persona. Acceptance goes from 78% to 41% overnight. Verdict scores per-persona surface it in minutes, not weeks.

Dashboard

Every fleet metric, in one panel.

Four lenses on the same fleet - overview, runs, cost, quality. Filter by persona, repo, workflow or time. Export anything.

Fleet at a glance

Run volume, throughput, cost trend and p95 latency - all on one canvas.

Runs today
1,284
PRs shipped
172
Token spend · 24h
$842
Avg run · p95
47s
Runs · hourly
last 18h
What we track

Six layers of signal, one source of truth.

Latency, spend, verdicts, errors, persona quality, delivery health - every signal a senior engineer would page on.

Run latency

End-to-end and step-level p50/p95/p99. Per-persona, per-workflow, per-region - no aggregation tricks.

Token + $ spend

Input, output, cached and reasoning tokens - costed per model, attributed to a persona, a repo and a team.

Step-level verdicts

Every checker, reviewer and policy gate emits a verdict. Roll up to pass-rate or drill into the failing diff.

Error fingerprints

Stack-aware error grouping. The same flaky tool call across 400 runs collapses to one fingerprint with a count and a trend.

Persona performance

Acceptance rate, mean review cycles, escape rate to production. Compare two persona versions on the same workload.

Webhook delivery

Per-endpoint success, latency and retry count. Alerts fire before your CI pipeline notices a webhook is dead.

Drill down

Every run, fully traced.

Click a run. See the agent's plan, every tool call, every model exchange, every file diff and every checker verdict - with timing.

Step-level traces unfold like a stack trace. Each step exposes the agent messages, the model called, the prompt and response tokens, and any artifacts written to disk. A waterfall view shows which step held the run up - usually it is the one waiting on a flaky third-party.

Replay is one click. Re-run a single step against a newer persona without rerunning the whole workflow.

run_7f3a91·Atlas-v2.5
26.3s · $0.18
  • Plansuccess1.2s
  • Read reposuccess3.8s
  • Apply patchwarn12.4s
    view messages · 18 turns
  • Run checkersuccess8.9s
Waterfall
Plan
Read repo
Apply patch
Run checker
By persona
Atlas-v2.5
$500
Helios-v1.3
$360
Mercury-v0.9
$240
By workflow
Ship-PR
$440
Triage
$280
Refactor-Sprint
$170
By repository
core/api
$460
web/dash
$320
infra/terraform
$140
Cost intelligence

FinOps for agent fleets.

Attribute every cent - by persona, by workflow, by repository, by team. Chargebacks to the right cost center, monthly, automatically.

Set budgets per persona or per repo. Soft alerts at 60%, hard caps at 95%. No surprise invoices, no apologetic Slack threads.

Mix-of-models reporting included. Move a workflow from Opus to Haiku and watch the savings show up in the next hour.

Audit & compliance

Insights your auditor will actually trust.

Every run, every prompt, every model exchange - written to an append-only store with cryptographic chaining.

Per-run audit log

Full prompt, response, tool calls and diffs - retained on the plan you choose, immutably.

Immutable trace store

Append-only, hash-chained, exportable. Tamper-evident by construction, not by policy.

RBAC on insights

Scope analytics by team, repo or persona. Engineers see their fleet, finance sees the bill.

Export to SIEM

Stream events to Splunk, Datadog or any S3-compatible bucket. Your incident timeline stays in your stack.

PII-aware redaction

Configurable redaction at ingest. Secrets, tokens and customer PII are masked before the row hits the warehouse.

SOC 2 ready

Logs map to the standard control IDs. SOC 2 Type II reports include the controls your auditor expects.

Export

Pipe insights into the stack you already use.

Datadog
Grafana
Snowflake
BigQuery
S3
Webhook
Analytics API

Query the warehouse with one SDK.

Same TypeScript SDK, typed responses. Build internal dashboards, reconcile billing, drive runbooks - without scraping the UI.

analytics/cost-per-persona.ts
// Cost-per-persona, last 7d, grouped, paginated
import { CodeCourier } from "@codecourier/sdk";

const cc = new CodeCourier({ apiKey: process.env.CC_KEY! });

const page = await cc.analytics.runs.list({
  filter: {
    from:    "-7d",
    status:  ["success", "warn"],
    workspace: "ws_prod",
  },
  groupBy: "persona",
  metrics: ["costUsd", "tokensIn", "tokensOut"],
  limit:   100,
});

page.data.forEach((r) => 
  console.log(r.persona, "$" + r.costUsd.toFixed(2))
);
// → Atlas-v2.5 $487.31
// → Helios-v1.3 $312.04 …

Run this against a workspace key. Group by persona, filter by status, paginate cursor-first.

Playbooks

Three teams already running on it.

FinOps for AI teams

Finance pulls a per-team cost breakdown weekly. Engineers see budgets next to each persona - overspend is caught in the editor, not the invoice.

Read the playbook

Regression alerts after persona changes

Every persona update kicks off a shadow eval. If acceptance rate drops by more than the threshold, the rollout is held and the on-call is paged.

Read the playbook

Compliance for regulated codebases

Banking and healthcare teams use immutable run logs to evidence what touched which file. SOC 2 and GDPR controls map one-to-one.

Talk to compliance
We went from ‘the agents seem to be working’ to a chargeback per team and a verdict score per persona - in one afternoon. Cost-per-PR is now a board metric.
Petra Lindqvist·Director of Platform · Helio Labs
FAQ

Questions a platform lead actually asks.

How long is data retained?
Free and Team plans retain 30 days of traces and 13 months of aggregated metrics. Enterprise picks the retention window per data class - typically 12 months for traces, 7 years for audit logs. Everything older is dropped or moved to your bucket, your choice.
Can I query the warehouse directly?
Yes. Enterprise plans ship a read-only Snowflake or BigQuery share, refreshed every five minutes. Free and Team plans use the Analytics API with cursor pagination. Either way you write the SQL once and own the dashboard.
Does cost include third-party model spend?
Yes. We meter input, output, cached and reasoning tokens per provider and convert to USD using your contracted rates. Bring-your-own-key customers see the same numbers - we just do not bill on top of them.
Real-time or batch?
Runs land in the dashboard within two seconds of completing. Cost rollups settle within one minute. Aggregations older than 24 hours are batched hourly. If you need second-resolution alerting, the Analytics API streams every event over Server-Sent Events.
What is the export format?
JSON Lines for traces, CSV or Parquet for metric rollups. Webhook payloads are versioned JSON with a stable schema. Every export carries a workspace id, a run id and a cryptographic checksum - drop it into your warehouse without a transform layer.
See the fleet, fix the fleet

Wire your first dashboard in under five minutes.

Free for 14 days · no credit card

Hire your first AI engineer.
Ship by lunchtime.

5 minutes to onboard. First PR within an hour. Cancel anytime.