Monitoring

Monitor workflow runs, quality scores, CI checks, sprint chains, sandbox health, and system performance in CodeCourier with real-time dashboards and notifications.

10 min read
monitoringrunsstatus

CodeCourier provides comprehensive monitoring tools for tracking workflow execution, sandbox health, and system-wide performance. All monitoring data is powered by Convex reactive queries, which means the dashboard updates in real time without polling. This guide covers the monitoring surfaces, the data they expose, and how to use them effectively.

Run Status Tracking

Run List Dashboard

The Runs page is the primary monitoring surface. It shows a paginated list of all runs for the current project, ordered by creation time (newest first). Each row displays:

  • Status badge -- Color-coded indicator showing pending (gray), running (blue), completed (green), failed (red), or cancelled (amber).
  • Run name -- Auto-generated or user-provided name.
  • Source -- Where the run originated: workflow, sprint, sandbox, or merge agent.
  • Prompt preview -- First few lines of the task description.
  • Timing -- Creation time and duration (if completed).
  • PR status -- Whether a pull request was created, merged, or failed.

Run Detail View

Clicking a run opens its detail page with comprehensive execution information:

  • Step timeline - A visual representation of every step in the pipeline, showing the role (designer, checker, evaluator, etc.), the CLI tool and model used, the status, and the duration. Steps are displayed in execution order with iteration numbers.
  • Sandbox output- Each step's sandbox terminal output, including AI agent responses, tool use events, and error messages. Output updates in real time while the step is running.
  • Checker verdicts - For checker steps, the pass/fail verdict and feedback text are displayed inline in the timeline.
  • Quality scores - For evaluator steps, the five-dimension quality score breakdown and composite score are displayed inline. A threshold indicator shows whether the composite score meets the configured threshold.
  • CI check status - The aggregate CI status (passing, failing, pending) and individual check results with links to GitHub.
  • Configuration- The run's sandbox config, prompt, reference images, and metadata.
  • Error details - If the run failed, the error message and the step where the failure occurred.

Real-Time Updates

All monitoring views use Convex reactive queries. When a step completes, a verdict is recorded, or a sandbox status changes, the UI reflects the update immediately. No manual refresh is needed.

Quality Score Monitoring

Quality scores provide a quantitative view of how well each workflow run meets defined quality criteria. They are produced by Evaluator steps and surfaced at two levels:

  • Run-level - The qualityScorefield on the run record holds the composite score (0–100) aggregated across all evaluator steps in the pipeline. This is visible in the Runs list as a score badge, enabling at-a-glance quality comparison across runs.
  • Step-level - Individual evaluator run step records carry the full qualityScores breakdown across all five dimensions.

Interpreting Quality Dimensions

Quality score dimensions
qualityScores: {
  correctness: number,       // 0-100: Does the implementation meet requirements?
  typeSafety: number,        // 0-100: Are TypeScript types correct and non-coercive?
  codeStyle: number,         // 0-100: Does code follow project conventions?
  testCoverage: number,      // 0-100: Are changes covered by meaningful tests?
  completeness: number,      // 0-100: Is the implementation fully finished, not stubbed?
  composite: number,         // 0-100: Weighted average of all five dimensions
  thresholdResult: boolean,  // True if composite >= configured threshold
}

Each dimension is scored independently from 0 (does not meet criteria) to 100 (fully meets criteria). The composite score is a weighted average - you can configure the weights on the evaluator persona to emphasize the dimensions most important to your project. The thresholdResult boolean is the most actionable signal: a false value means the implementation did not meet your quality bar and may warrant further iteration.

Quality Trends

The Workflow Analytics section shows quality score trends over time for a given workflow. Tracking the composite score across runs reveals whether your pipeline is consistently producing high-quality output or exhibiting degrading quality over time (which often signals that persona instructions need refinement or the workflow needs an additional improvement pass).

Quality Score Baselines

Establish a baseline quality score for each workflow after the first few runs. Flag runs that fall significantly below the baseline (e.g., more than 10 points below) for manual review. Sudden drops often indicate a prompt issue, a changed dependency, or a regression in the codebase that the evaluator is detecting correctly.

CI Checks Monitoring

After a run creates a pull request, CodeCourier monitors the CI check status for that PR and surfaces it in the monitoring UI.

CI Status in the Run List

The Runs list displays a CI status indicator alongside the PR status for each run. The aggregate status ("passing", "failing", or "pending") is shown as a colored badge. Runs with prStatus = "blocked_on_ci" are highlighted to indicate that the PR cannot be merged until CI resolves.

Individual Check Details

The run detail view lists each individual CI check with its name, status, and a direct link to the check run on GitHub. This allows you to navigate directly from a failing CodeCourier run to the specific CI job output, reducing the time spent diagnosing failures.

CI checks structure
ciChecks: {
  status: "passing" | "failing" | "pending",
  checks: Array<{
    name: string,      // e.g., "Build", "Unit Tests", "Lint", "E2E Tests"
    status: string,    // Per-check status from GitHub
    url: string,       // Direct link to the check run
  }>,
  checkedAt: number,   // Timestamp of last GitHub poll
}

Sprint Chain Monitoring

Sprint chains appear in the monitoring surfaces with additional context compared to individual runs.

Sprint Chain in the Runs List

Individual sprint runs appear in the Runs list with source sprint. The run name includes the sprint number (e.g., "Sprint 2 of 5 - Feature X") so you can track each phase at a glance. You can filter the Runs list by source to show only sprint-originated runs.

Sprint Chain Detail View

The sprint chain detail view provides a consolidated view of the entire chain:

  • Chain status - The overall chain state (pending, running, completed, failed, or cancelled).
  • Sprint progress- Current sprint index and total sprint count (e.g., "3 of 5 sprints completed").
  • Per-sprint PR URLs - The sprintPrUrls array displayed as a clickable list, one entry per sprint. Completed sprints show their PR URL; future sprints show as pending.
  • Sprint timeline - A timeline of sprint executions with start and end times, enabling duration comparison across sprints.

Sprint Failure Isolation

If a sprint chain fails, the chain detail view highlights which sprint failed and links to that sprint’s run detail page. You can diagnose the failure, fix the underlying issue, and use the resumeFromSprint capability to restart the chain from the failed sprint without re-running earlier phases.

Trigger.dev Run Status

In addition to Convex-level tracking, CodeCourier links each run to its corresponding Trigger.dev task execution. The triggerRunId field on the run record provides traceability to the Trigger.dev dashboard where you can inspect:

  • Task queue position and scheduling.
  • Execution logs at the infrastructure level.
  • Retry history (if the task was retried).
  • Resource consumption and timing.

This two-level tracking (Convex + Trigger.dev) ensures you can debug both application-level issues (wrong prompt, checker feedback) and infrastructure-level issues (timeout, OOM, network failure).

Sandbox Monitoring

Active Sandbox Counter

The project dashboard displays a counter of active sandboxes -- the number of sandboxes currently in the running state. This counter is denormalized in the projectCounters table and updates in real time as sandboxes start and stop.

Sandbox List

The Sandboxes page shows all sandboxes for the project (excluding those created by workflows and issue sessions). Each entry shows the sandbox status, configuration, creation time, and linked PR information.

Streaming Terminal

For individual sandboxes, the streaming terminal component provides real-time output from the AI agent. The terminal renders:

  • Assistant messages with formatted text.
  • Tool use events (file writes, command execution).
  • User messages sent interactively.
  • Status indicators for streaming, completed, and error states.

Project-Level Metrics

Project Counters

CodeCourier maintains denormalized counters for each project:

  • Total sandboxes -- Lifetime count of sandboxes created.
  • Active sandboxes -- Currently running sandboxes.
  • Total runs -- Lifetime count of workflow runs.
  • Completed runs -- Successfully finished runs.
  • Failed runs -- Runs that ended in failure.
  • Total workflows -- Number of workflow blueprints.
  • Total members -- Team members in the project.
  • Pending invitations -- Unaccepted member invitations.

These counters are displayed on the project overview page and update reactively.

Daily Statistics

The dailyStats table tracks per-day metrics:

  • Sandboxes created.
  • Runs created, completed, and failed.
  • Total iterations across all runs.
  • Workflows created.

This data powers historical charts and trend analysis in the project dashboard.

Usage and Cost Tracking

Every sandbox session and workflow step generates usage records in theusageRecords table. These records provide detailed cost visibility:

Usage record fields
{
  service: "anthropic",     // or "openai", "openrouter", "e2b", "trigger_dev"
  date: "2026-03-15",       // ISO date
  quantity: 15000,          // tokens consumed
  unit: "output_tokens",    // what was measured
  costUsd: 0.45,            // calculated cost
  toolId: "claude",         // CLI tool used
  modelId: "claude-opus-4-6", // specific model
  stepType: "designer",     // step role
  inputTokens: 12000,       // detailed token breakdown
  outputTokens: 15000,
  durationMs: 45000,        // step duration
}

Usage records are linked to specific runs, sandboxes, chains, and issue sessions. This allows you to track costs at every level -- from individual steps to entire work chains.

Notifications

CodeCourier sends notifications for important events. Thenotifications table stores per-user notifications with the following types:

  • run_completed -- A workflow run finished successfully.
  • run_failed -- A workflow run failed.
  • pr_created -- A pull request was created.
  • pr_merged -- A pull request was merged.
  • pr_failed -- Pull request creation failed.
  • member_joined -- A new team member joined the project.
  • workflow_completed -- All steps in a workflow completed.
  • sprint_completed -- A sprint chain completed.
  • sprint_failed -- A sprint chain failed.

Notifications are displayed in the dashboard and can be marked as read or dismissed. They are indexed by project, user, and read status for efficient querying.

Error Monitoring

Error Surfaces

Errors are captured at multiple levels:

  • Sandbox errors -- The error field on sandbox records stores E2B provisioning failures and agent crashes.
  • Run errors -- The error field on run records stores pipeline-level failures.
  • Step errors -- The error field on run step records stores step-specific failures.
  • PR errors -- The prError field on sandbox and run records stores pull request creation failures.
  • Learning extraction errors -- The learningExtractionError field on sandbox records.

Common Error Patterns

  • API key misconfiguration -- Missing or invalid keys for E2B, Anthropic, or GitHub. Check Project Settings.
  • Template not found -- The specified E2B template does not exist. Verify template ID in workflow config.
  • Timeout exceeded -- The sandbox ran longer than the configured timeout. Increase the timeout or simplify the task.
  • Rate limiting -- The AI provider rate-limited requests. Wait and retry, or switch to a different model.
  • Git push failure -- The sandbox could not push to the remote. Check that the GitHub token has write access.

Monitoring Best Practices

Check the run list periodically for failed runs. A pattern of failures often indicates a configuration issue (wrong API key, insufficient timeout, or a template problem) rather than individual task failures. Fix the root cause in project settings rather than retrying runs.

Workflow Analytics

CodeCourier provides analytics queries for workflow performance. The workflowAnalytics module exposes metrics like:

  • Runs per workflow (how often each blueprint is used).
  • Success rate (completed vs. failed runs).
  • Average iteration count (how many loops before passing).
  • Average duration (time from start to completion).

These metrics help you identify which workflows are effective and which need tuning -- whether that means adjusting persona instructions, changing models, or restructuring the pipeline.