Product · Analytics

Know what your AI engineers actually did.

Running fleets of AI agents without observability is reckless. CodeCourier Analytics ships the dashboard - every run traced, every token costed, every regression caught before it lands in main.

Start free Read the docs

codecourier · analyticslive

Runs · 24h

1,284

+12.4%

Runs today

1,284

+12%

PRs shipped

172

+24

Token spend

$842

−3.1%

Avg run · p95

14d activity

The problem

Agents fail silently. Then your bill explodes.

Silent failures

An agent retries, the checker passes, the PR ships - but the diff is wrong. Without traces you find out from a customer.

Runaway token spend

One bad prompt loop multiplies cost by 40×. Cost-per-PR is the only metric that catches it before the invoice does.

Persona regressions

You tune a persona. Acceptance goes from 78% to 41% overnight. Verdict scores per-persona surface it in minutes, not weeks.

Dashboard

Every fleet metric, in one panel.

Four lenses on the same fleet - overview, runs, cost, quality. Filter by persona, repo, workflow or time. Export anything.

Fleet at a glance

Run volume, throughput, cost trend and p95 latency - all on one canvas.

Runs today

1,284

PRs shipped

172

Token spend · 24h

$842

Avg run · p95

47s

Runs · hourly

last 18h

What we track

Six layers of signal, one source of truth.

Latency, spend, verdicts, errors, persona quality, delivery health - every signal a senior engineer would page on.

Run latency

End-to-end and step-level p50/p95/p99. Per-persona, per-workflow, per-region - no aggregation tricks.

Token + $ spend

Input, output, cached and reasoning tokens - costed per model, attributed to a persona, a repo and a team.

Step-level verdicts

Every checker, reviewer and policy gate emits a verdict. Roll up to pass-rate or drill into the failing diff.

Error fingerprints

Stack-aware error grouping. The same flaky tool call across 400 runs collapses to one fingerprint with a count and a trend.

Persona performance

Acceptance rate, mean review cycles, escape rate to production. Compare two persona versions on the same workload.

Webhook delivery

Per-endpoint success, latency and retry count. Alerts fire before your CI pipeline notices a webhook is dead.

Drill down

Every run, fully traced.

Click a run. See the agent's plan, every tool call, every model exchange, every file diff and every checker verdict - with timing.

Step-level traces unfold like a stack trace. Each step exposes the agent messages, the model called, the prompt and response tokens, and any artifacts written to disk. A waterfall view shows which step held the run up - usually it is the one waiting on a flaky third-party.

Replay is one click. Re-run a single step against a newer persona without rerunning the whole workflow.

run_7f3a91·Atlas-v2.5

26.3s · $0.18

Plansuccess1.2s
Read reposuccess3.8s
Apply patchwarn12.4s
view messages · 18 turns
Run checkersuccess8.9s

Waterfall

Plan

Read repo

Apply patch

Run checker

By persona

Atlas-v2.5

$500

Helios-v1.3

$360

Mercury-v0.9

$240

By workflow

Ship-PR

$440

Triage

$280

Refactor-Sprint

$170

By repository

core/api

$460

web/dash

$320

infra/terraform

$140

Cost intelligence

FinOps for agent fleets.

Attribute every cent - by persona, by workflow, by repository, by team. Chargebacks to the right cost center, monthly, automatically.

Set budgets per persona or per repo. Soft alerts at 60%, hard caps at 95%. No surprise invoices, no apologetic Slack threads.

Mix-of-models reporting included. Move a workflow from Opus to Haiku and watch the savings show up in the next hour.

Set alert thresholds

Audit & compliance

Insights your auditor will actually trust.

Every run, every prompt, every model exchange - written to an append-only store with cryptographic chaining.

Per-run audit log

Full prompt, response, tool calls and diffs - retained on the plan you choose, immutably.

Immutable trace store

Append-only, hash-chained, exportable. Tamper-evident by construction, not by policy.

RBAC on insights

Scope analytics by team, repo or persona. Engineers see their fleet, finance sees the bill.

Export to SIEM

Stream events to Splunk, Datadog or any S3-compatible bucket. Your incident timeline stays in your stack.

PII-aware redaction

Configurable redaction at ingest. Secrets, tokens and customer PII are masked before the row hits the warehouse.

SOC 2 ready

Logs map to the standard control IDs. SOC 2 Type II reports include the controls your auditor expects.

Export

Pipe insights into the stack you already use.

Datadog

Grafana

Snowflake

BigQuery

Webhook

Analytics API

Query the warehouse with one SDK.

Same TypeScript SDK, typed responses. Build internal dashboards, reconcile billing, drive runbooks - without scraping the UI.

Analytics API Monitoring guide Usage tracking

analytics/cost-per-persona.ts

// Cost-per-persona, last 7d, grouped, paginated
import { CodeCourier } from "@codecourier/sdk";

const cc = new CodeCourier({ apiKey: process.env.CC_KEY! });

const page = await cc.analytics.runs.list({
  filter: {
    from:    "-7d",
    status:  ["success", "warn"],
    workspace: "ws_prod",
  },
  groupBy: "persona",
  metrics: ["costUsd", "tokensIn", "tokensOut"],
  limit:   100,
});

page.data.forEach((r) => 
  console.log(r.persona, "$" + r.costUsd.toFixed(2))
);
// → Atlas-v2.5 $487.31
// → Helios-v1.3 $312.04 …

Run this against a workspace key. Group by persona, filter by status, paginate cursor-first.

Playbooks

Three teams already running on it.

FinOps for AI teams

Finance pulls a per-team cost breakdown weekly. Engineers see budgets next to each persona - overspend is caught in the editor, not the invoice.

Read the playbook

Regression alerts after persona changes

Every persona update kicks off a shadow eval. If acceptance rate drops by more than the threshold, the rollout is held and the on-call is paged.

Read the playbook

Compliance for regulated codebases

Banking and healthcare teams use immutable run logs to evidence what touched which file. SOC 2 and GDPR controls map one-to-one.

Talk to compliance

We went from ‘the agents seem to be working’ to a chargeback per team and a verdict score per persona - in one afternoon. Cost-per-PR is now a board metric.

Petra Lindqvist·Director of Platform · Helio Labs

FAQ

Questions a platform lead actually asks.

How long is data retained?

Free and Team plans retain 30 days of traces and 13 months of aggregated metrics. Enterprise picks the retention window per data class - typically 12 months for traces, 7 years for audit logs. Everything older is dropped or moved to your bucket, your choice.

Can I query the warehouse directly?

Yes. Enterprise plans ship a read-only Snowflake or BigQuery share, refreshed every five minutes. Free and Team plans use the Analytics API with cursor pagination. Either way you write the SQL once and own the dashboard.

Does cost include third-party model spend?

Yes. We meter input, output, cached and reasoning tokens per provider and convert to USD using your contracted rates. Bring-your-own-key customers see the same numbers - we just do not bill on top of them.

Real-time or batch?

Runs land in the dashboard within two seconds of completing. Cost rollups settle within one minute. Aggregations older than 24 hours are batched hourly. If you need second-resolution alerting, the Analytics API streams every event over Server-Sent Events.

What is the export format?

JSON Lines for traces, CSV or Parquet for metric rollups. Webhook payloads are versioned JSON with a stable schema. Every export carries a workspace id, a run id and a cryptographic checksum - drop it into your warehouse without a transform layer.

See the fleet, fix the fleet

Wire your first dashboard in under five minutes.

Start free Read the docs

Free for 14 days · no credit card

Hire your first AI engineer.
Ship by lunchtime.

5 minutes to onboard. First PR within an hour. Cancel anytime.

Deploy your first agent Book a 20-min demo

Know what your AI engineers actually did.

Agents fail silently. Then your bill explodes.

Silent failures

Runaway token spend

Persona regressions

Every fleet metric, in one panel.

Fleet at a glance

Six layers of signal, one source of truth.

Run latency

Token + $ spend

Step-level verdicts

Error fingerprints

Persona performance

Webhook delivery

Every run, fully traced.

FinOps for agent fleets.

Insights your auditor will actually trust.

Per-run audit log

Immutable trace store

RBAC on insights

Export to SIEM

PII-aware redaction

SOC 2 ready

Pipe insights into the stack you already use.

Query the warehouse with one SDK.

Three teams already running on it.

FinOps for AI teams

Regression alerts after persona changes

Compliance for regulated codebases

Questions a platform lead actually asks.

Wire your first dashboard in under five minutes.

Hire your first AI engineer.Ship by lunchtime.

Hire your first AI engineer.
Ship by lunchtime.