Monitoring
Monitor workflow runs, quality scores, CI checks, sprint chains, sandbox health, and system performance in CodeCourier with real-time dashboards and notifications.
CodeCourier provides comprehensive monitoring tools for tracking workflow execution, sandbox health, and system-wide performance. All monitoring data is powered by Convex reactive queries, which means the dashboard updates in real time without polling. This guide covers the monitoring surfaces, the data they expose, and how to use them effectively.
Run Status Tracking
Run List Dashboard
The Runs page is the primary monitoring surface. It shows a paginated list of all runs for the current project, ordered by creation time (newest first). Each row displays:
- Status badge -- Color-coded indicator showing pending (gray), running (blue), completed (green), failed (red), or cancelled (amber).
- Run name -- Auto-generated or user-provided name.
- Source -- Where the run originated: workflow, sprint, sandbox, or merge agent.
- Prompt preview -- First few lines of the task description.
- Timing -- Creation time and duration (if completed).
- PR status -- Whether a pull request was created, merged, or failed.
Run Detail View
Clicking a run opens its detail page with comprehensive execution information:
- Step timeline - A visual representation of every step in the pipeline, showing the role (designer, checker, evaluator, etc.), the CLI tool and model used, the status, and the duration. Steps are displayed in execution order with iteration numbers.
- Sandbox output- Each step's sandbox terminal output, including AI agent responses, tool use events, and error messages. Output updates in real time while the step is running.
- Checker verdicts - For checker steps, the pass/fail verdict and feedback text are displayed inline in the timeline.
- Quality scores - For evaluator steps, the five-dimension quality score breakdown and composite score are displayed inline. A threshold indicator shows whether the composite score meets the configured threshold.
- CI check status - The aggregate CI status (passing, failing, pending) and individual check results with links to GitHub.
- Configuration- The run's sandbox config, prompt, reference images, and metadata.
- Error details - If the run failed, the error message and the step where the failure occurred.
Real-Time Updates
Quality Score Monitoring
Quality scores provide a quantitative view of how well each workflow run meets defined quality criteria. They are produced by Evaluator steps and surfaced at two levels:
- Run-level - The
qualityScorefield on the run record holds the composite score (0–100) aggregated across all evaluator steps in the pipeline. This is visible in the Runs list as a score badge, enabling at-a-glance quality comparison across runs. - Step-level - Individual evaluator run step records carry the full
qualityScoresbreakdown across all five dimensions.
Interpreting Quality Dimensions
qualityScores: {
correctness: number, // 0-100: Does the implementation meet requirements?
typeSafety: number, // 0-100: Are TypeScript types correct and non-coercive?
codeStyle: number, // 0-100: Does code follow project conventions?
testCoverage: number, // 0-100: Are changes covered by meaningful tests?
completeness: number, // 0-100: Is the implementation fully finished, not stubbed?
composite: number, // 0-100: Weighted average of all five dimensions
thresholdResult: boolean, // True if composite >= configured threshold
}Each dimension is scored independently from 0 (does not meet criteria) to 100 (fully meets criteria). The composite score is a weighted average - you can configure the weights on the evaluator persona to emphasize the dimensions most important to your project. The thresholdResult boolean is the most actionable signal: a false value means the implementation did not meet your quality bar and may warrant further iteration.
Quality Trends
The Workflow Analytics section shows quality score trends over time for a given workflow. Tracking the composite score across runs reveals whether your pipeline is consistently producing high-quality output or exhibiting degrading quality over time (which often signals that persona instructions need refinement or the workflow needs an additional improvement pass).
Quality Score Baselines
CI Checks Monitoring
After a run creates a pull request, CodeCourier monitors the CI check status for that PR and surfaces it in the monitoring UI.
CI Status in the Run List
The Runs list displays a CI status indicator alongside the PR status for each run. The aggregate status ("passing", "failing", or "pending") is shown as a colored badge. Runs with prStatus = "blocked_on_ci" are highlighted to indicate that the PR cannot be merged until CI resolves.
Individual Check Details
The run detail view lists each individual CI check with its name, status, and a direct link to the check run on GitHub. This allows you to navigate directly from a failing CodeCourier run to the specific CI job output, reducing the time spent diagnosing failures.
ciChecks: {
status: "passing" | "failing" | "pending",
checks: Array<{
name: string, // e.g., "Build", "Unit Tests", "Lint", "E2E Tests"
status: string, // Per-check status from GitHub
url: string, // Direct link to the check run
}>,
checkedAt: number, // Timestamp of last GitHub poll
}Sprint Chain Monitoring
Sprint chains appear in the monitoring surfaces with additional context compared to individual runs.
Sprint Chain in the Runs List
Individual sprint runs appear in the Runs list with source sprint. The run name includes the sprint number (e.g., "Sprint 2 of 5 - Feature X") so you can track each phase at a glance. You can filter the Runs list by source to show only sprint-originated runs.
Sprint Chain Detail View
The sprint chain detail view provides a consolidated view of the entire chain:
- Chain status - The overall chain state (pending, running, completed, failed, or cancelled).
- Sprint progress- Current sprint index and total sprint count (e.g., "3 of 5 sprints completed").
- Per-sprint PR URLs - The
sprintPrUrlsarray displayed as a clickable list, one entry per sprint. Completed sprints show their PR URL; future sprints show as pending. - Sprint timeline - A timeline of sprint executions with start and end times, enabling duration comparison across sprints.
Sprint Failure Isolation
resumeFromSprint capability to restart the chain from the failed sprint without re-running earlier phases.Trigger.dev Run Status
In addition to Convex-level tracking, CodeCourier links each run to its corresponding Trigger.dev task execution. The triggerRunId field on the run record provides traceability to the Trigger.dev dashboard where you can inspect:
- Task queue position and scheduling.
- Execution logs at the infrastructure level.
- Retry history (if the task was retried).
- Resource consumption and timing.
This two-level tracking (Convex + Trigger.dev) ensures you can debug both application-level issues (wrong prompt, checker feedback) and infrastructure-level issues (timeout, OOM, network failure).
Sandbox Monitoring
Active Sandbox Counter
The project dashboard displays a counter of active sandboxes -- the number of sandboxes currently in the running state. This counter is denormalized in the projectCounters table and updates in real time as sandboxes start and stop.
Sandbox List
The Sandboxes page shows all sandboxes for the project (excluding those created by workflows and issue sessions). Each entry shows the sandbox status, configuration, creation time, and linked PR information.
Streaming Terminal
For individual sandboxes, the streaming terminal component provides real-time output from the AI agent. The terminal renders:
- Assistant messages with formatted text.
- Tool use events (file writes, command execution).
- User messages sent interactively.
- Status indicators for streaming, completed, and error states.
Project-Level Metrics
Project Counters
CodeCourier maintains denormalized counters for each project:
- Total sandboxes -- Lifetime count of sandboxes created.
- Active sandboxes -- Currently running sandboxes.
- Total runs -- Lifetime count of workflow runs.
- Completed runs -- Successfully finished runs.
- Failed runs -- Runs that ended in failure.
- Total workflows -- Number of workflow blueprints.
- Total members -- Team members in the project.
- Pending invitations -- Unaccepted member invitations.
These counters are displayed on the project overview page and update reactively.
Daily Statistics
The dailyStats table tracks per-day metrics:
- Sandboxes created.
- Runs created, completed, and failed.
- Total iterations across all runs.
- Workflows created.
This data powers historical charts and trend analysis in the project dashboard.
Usage and Cost Tracking
Every sandbox session and workflow step generates usage records in theusageRecords table. These records provide detailed cost visibility:
{
service: "anthropic", // or "openai", "openrouter", "e2b", "trigger_dev"
date: "2026-03-15", // ISO date
quantity: 15000, // tokens consumed
unit: "output_tokens", // what was measured
costUsd: 0.45, // calculated cost
toolId: "claude", // CLI tool used
modelId: "claude-opus-4-6", // specific model
stepType: "designer", // step role
inputTokens: 12000, // detailed token breakdown
outputTokens: 15000,
durationMs: 45000, // step duration
}Usage records are linked to specific runs, sandboxes, chains, and issue sessions. This allows you to track costs at every level -- from individual steps to entire work chains.
Notifications
CodeCourier sends notifications for important events. Thenotifications table stores per-user notifications with the following types:
- run_completed -- A workflow run finished successfully.
- run_failed -- A workflow run failed.
- pr_created -- A pull request was created.
- pr_merged -- A pull request was merged.
- pr_failed -- Pull request creation failed.
- member_joined -- A new team member joined the project.
- workflow_completed -- All steps in a workflow completed.
- sprint_completed -- A sprint chain completed.
- sprint_failed -- A sprint chain failed.
Notifications are displayed in the dashboard and can be marked as read or dismissed. They are indexed by project, user, and read status for efficient querying.
Error Monitoring
Error Surfaces
Errors are captured at multiple levels:
- Sandbox errors -- The
errorfield on sandbox records stores E2B provisioning failures and agent crashes. - Run errors -- The
errorfield on run records stores pipeline-level failures. - Step errors -- The
errorfield on run step records stores step-specific failures. - PR errors -- The
prErrorfield on sandbox and run records stores pull request creation failures. - Learning extraction errors -- The
learningExtractionErrorfield on sandbox records.
Common Error Patterns
- API key misconfiguration -- Missing or invalid keys for E2B, Anthropic, or GitHub. Check Project Settings.
- Template not found -- The specified E2B template does not exist. Verify template ID in workflow config.
- Timeout exceeded -- The sandbox ran longer than the configured timeout. Increase the timeout or simplify the task.
- Rate limiting -- The AI provider rate-limited requests. Wait and retry, or switch to a different model.
- Git push failure -- The sandbox could not push to the remote. Check that the GitHub token has write access.
Monitoring Best Practices
Workflow Analytics
CodeCourier provides analytics queries for workflow performance. The workflowAnalytics module exposes metrics like:
- Runs per workflow (how often each blueprint is used).
- Success rate (completed vs. failed runs).
- Average iteration count (how many loops before passing).
- Average duration (time from start to completion).
These metrics help you identify which workflows are effective and which need tuning -- whether that means adjusting persona instructions, changing models, or restructuring the pipeline.