Know what your AI engineers actually did.
Running fleets of AI agents without observability is reckless. CodeCourier Analytics ships the dashboard - every run traced, every token costed, every regression caught before it lands in main.
Agents fail silently. Then your bill explodes.
Silent failures
An agent retries, the checker passes, the PR ships - but the diff is wrong. Without traces you find out from a customer.
Runaway token spend
One bad prompt loop multiplies cost by 40×. Cost-per-PR is the only metric that catches it before the invoice does.
Persona regressions
You tune a persona. Acceptance goes from 78% to 41% overnight. Verdict scores per-persona surface it in minutes, not weeks.
Every fleet metric, in one panel.
Four lenses on the same fleet - overview, runs, cost, quality. Filter by persona, repo, workflow or time. Export anything.
Fleet at a glance
Run volume, throughput, cost trend and p95 latency - all on one canvas.
Six layers of signal, one source of truth.
Latency, spend, verdicts, errors, persona quality, delivery health - every signal a senior engineer would page on.
Run latency
End-to-end and step-level p50/p95/p99. Per-persona, per-workflow, per-region - no aggregation tricks.
Token + $ spend
Input, output, cached and reasoning tokens - costed per model, attributed to a persona, a repo and a team.
Step-level verdicts
Every checker, reviewer and policy gate emits a verdict. Roll up to pass-rate or drill into the failing diff.
Error fingerprints
Stack-aware error grouping. The same flaky tool call across 400 runs collapses to one fingerprint with a count and a trend.
Persona performance
Acceptance rate, mean review cycles, escape rate to production. Compare two persona versions on the same workload.
Webhook delivery
Per-endpoint success, latency and retry count. Alerts fire before your CI pipeline notices a webhook is dead.
Every run, fully traced.
Click a run. See the agent's plan, every tool call, every model exchange, every file diff and every checker verdict - with timing.
Step-level traces unfold like a stack trace. Each step exposes the agent messages, the model called, the prompt and response tokens, and any artifacts written to disk. A waterfall view shows which step held the run up - usually it is the one waiting on a flaky third-party.
Replay is one click. Re-run a single step against a newer persona without rerunning the whole workflow.
- Plansuccess1.2s
- Read reposuccess3.8s
- Apply patchwarn12.4sview messages · 18 turns
- Run checkersuccess8.9s
FinOps for agent fleets.
Attribute every cent - by persona, by workflow, by repository, by team. Chargebacks to the right cost center, monthly, automatically.
Set budgets per persona or per repo. Soft alerts at 60%, hard caps at 95%. No surprise invoices, no apologetic Slack threads.
Mix-of-models reporting included. Move a workflow from Opus to Haiku and watch the savings show up in the next hour.
Insights your auditor will actually trust.
Every run, every prompt, every model exchange - written to an append-only store with cryptographic chaining.
Per-run audit log
Full prompt, response, tool calls and diffs - retained on the plan you choose, immutably.
Immutable trace store
Append-only, hash-chained, exportable. Tamper-evident by construction, not by policy.
RBAC on insights
Scope analytics by team, repo or persona. Engineers see their fleet, finance sees the bill.
Export to SIEM
Stream events to Splunk, Datadog or any S3-compatible bucket. Your incident timeline stays in your stack.
PII-aware redaction
Configurable redaction at ingest. Secrets, tokens and customer PII are masked before the row hits the warehouse.
SOC 2 ready
Logs map to the standard control IDs. SOC 2 Type II reports include the controls your auditor expects.
Pipe insights into the stack you already use.
Query the warehouse with one SDK.
Same TypeScript SDK, typed responses. Build internal dashboards, reconcile billing, drive runbooks - without scraping the UI.
// Cost-per-persona, last 7d, grouped, paginated
import { CodeCourier } from "@codecourier/sdk";
const cc = new CodeCourier({ apiKey: process.env.CC_KEY! });
const page = await cc.analytics.runs.list({
filter: {
from: "-7d",
status: ["success", "warn"],
workspace: "ws_prod",
},
groupBy: "persona",
metrics: ["costUsd", "tokensIn", "tokensOut"],
limit: 100,
});
page.data.forEach((r) =>
console.log(r.persona, "$" + r.costUsd.toFixed(2))
);
// → Atlas-v2.5 $487.31
// → Helios-v1.3 $312.04 …
Run this against a workspace key. Group by persona, filter by status, paginate cursor-first.
Three teams already running on it.
FinOps for AI teams
Finance pulls a per-team cost breakdown weekly. Engineers see budgets next to each persona - overspend is caught in the editor, not the invoice.
Read the playbookRegression alerts after persona changes
Every persona update kicks off a shadow eval. If acceptance rate drops by more than the threshold, the rollout is held and the on-call is paged.
Read the playbookCompliance for regulated codebases
Banking and healthcare teams use immutable run logs to evidence what touched which file. SOC 2 and GDPR controls map one-to-one.
Talk to complianceWe went from ‘the agents seem to be working’ to a chargeback per team and a verdict score per persona - in one afternoon. Cost-per-PR is now a board metric.
Questions a platform lead actually asks.
How long is data retained?
Can I query the warehouse directly?
Does cost include third-party model spend?
Real-time or batch?
What is the export format?
Hire your first AI engineer.
Ship by lunchtime.
5 minutes to onboard. First PR within an hour. Cancel anytime.