Core Concepts

Understand the fundamental building blocks of CodeCourier: projects, sandboxes, workflows, personas, contexts, assets, sprint chains, recurring tasks, quality scoring, and learnings.

15 min read
conceptsarchitecturedata-model

CodeCourier is built around a small set of interconnected concepts. Understanding how they relate to each other is the key to using the platform effectively. This page explains each concept, its role in the system, and how it connects to everything else.

Projects

A project is the top-level organizational unit in CodeCourier. Every other resource - sandboxes, workflows, runs, personas, contexts, assets, plans, issues, learnings, recurring tasks, sprint chains, team members, and API keys - belongs to exactly one project.

Each project has a unique slug that appears in all URLs (e.g., /p/my-app/dashboard), an owner, and optional metadata such as a GitHub repository URL and a project logo. Projects can be linked to a GitHub repository, which enables automatic branch creation and pull request generation from workflow runs.

Project settings

Project settings control default behaviors across the entire project:

  • Sandbox system prompt - A default instruction appended to every sandbox session in the project
  • CLAUDE.md content - Markdown that is written into the sandbox as a CLAUDE.md file for CLI tools that support it
  • Session-specific configurations - Each session type (learning, merging, issue, answering, evaluator, judge) has its own dedicated configuration page where you can bind a context, select skills, commands, and scripts, and set other session-level defaults
  • Environment variables - Key-value pairs injected into sandbox environments, with support for marking sensitive values as secrets
  • Git configuration - Custom commit author name and email for agent-generated commits
  • Learning and merging settings - Configure which AI model and template to use for learning extraction and branch merging

Team roles

Projects support three roles:

  • Owner - Full access, including project deletion and member management
  • Admin - Can manage settings, members, and all resources within the project
  • Member - Can create and manage their own resources (sandboxes, runs, workflows) within the project

Members are invited by email and must accept the invitation before gaining access. Each member can configure their own API keys that are used for sandbox provisioning.

Sandboxes

A sandbox is an isolated Linux virtual machine provisioned through E2B. Sandboxes are the execution environment where AI coding agents run. Each sandbox is a complete Linux system with its own filesystem, network stack, and process space.

Sandbox configuration

Every sandbox is created with a configuration object that defines:

  • Template ID - Determines the base image and pre-installed tools. Different templates support different CLI tools (Claude Code, OpenCode, Codex, Pi, etc.)
  • Timeout - How long the sandbox can run before automatic termination, from 1 minute to 4 hours
  • Memory- Allocated RAM from 256 MB to 8 GB
  • CPU count - Number of CPU cores from 1 to 8
  • Model overrides - Specify different AI models for designer and checker steps
  • Thinking effort - Per-model thinking effort levels (e.g., high, medium, low) that control how much reasoning the model does before responding

Sandbox lifecycle

Sandboxes progress through these states:

  • Creating - The E2B API is provisioning the VM
  • Running - The sandbox is active and the agent is executing
  • Paused - The sandbox has been suspended but can be resumed
  • Killed - The sandbox has been terminated (either by timeout, user action, or completion)
  • Error - The sandbox encountered a fatal error during provisioning or execution

Sandbox messages

Every message exchanged between the platform and the AI agent inside a sandbox is stored as a sandboxMessage record with the role (user or assistant), content, optional stream log, and timestamps. Messages are indexed by sandbox and timestamp for efficient retrieval and real-time streaming.

Workflows

A workflow is a reusable blueprint that defines how AI agents should process a task. Workflows specify the type of pipeline, default sandbox configuration, and step-level instructions.

Workflow types

Single Designer

The simplest type. A single agent receives the prompt and executes it in one pass. Best for straightforward tasks where review is not needed.

Designer & Checker

The most common type. A designer agent writes code based on the prompt, then a checker agent reviews the output against configurable instructions. If the checker rejects (its verdict has pass: false), the designer receives the feedback and iterates. This loop continues until the checker approves or the maximum iteration count is reached.

Custom Pipeline

Define an arbitrary sequence of step types: designer, checker, optimizer, prompter, investigator, deep-dive, evaluator, or judge. Each step can optionally specify its own CLI tool, model, thinking effort, and instructions. Steps can be grouped into loops with configurable maximum iterations.

Persona Pipeline

Chain together named personas into a sequential pipeline. Each step references a persona by ID and inherits that persona’s full configuration. This is the most flexible type, allowing you to compose complex multi-agent workflows from reusable persona definitions.

Runs

A run is a single execution of a workflow (or a standalone sandbox session). Runs store the prompt, configuration, status, GitHub branch information, PR details, cost tracking, quality score, CI check status, and references to the workflow and project they belong to.

Each run contains one or more run steps. A run step represents a single agent invocation - one designer pass, one checker review, one evaluator assessment, etc. Steps track their role (designer, checker, optimizer, prompter, investigator, deep-dive, evaluator, judge, or merge_agent), status, duration, the sandbox used, checker verdicts, and individual quality scores.

Run sources

Runs can originate from different sources: a workflow run triggered manually, an issue run from a work chain, a sprint_chain run, a direct sandbox session, a merge_agent run that merges branches, or a recurring_task run dispatched on schedule. The source is tracked on each run for analytics and filtering.

Personas

A persona is a reusable AI agent configuration scoped to a project. Personas let you define standardized agent behaviors that can be referenced across workflows and pipeline steps.

Each persona includes:

  • Type - The role the persona fills. The full set of persona types is:
    • designer - Implements features and fixes
    • checker - Reviews and validates output
    • optimizer - Refactors and improves existing code
    • prompter - Generates or refines prompts for downstream agents
    • investigator - Explores codebases and researches before making changes
    • planner - Produces structured plans and issue lists
    • deep-dive - Performs in-depth analysis with extended reasoning
    • reviewer - Focuses on code review, readability, and maintainability
    • custom - Unconstrained type for specialized use cases
  • CLI tool - Which CLI client to use in the sandbox (identified by a tool ID string)
  • Model - Which AI model to use (e.g., claude-opus-4-6, claude-sonnet-4-6)
  • Thinking effort - How much reasoning the model should apply
  • Instructions- Custom system-level instructions that shape the agent’s behavior
  • Skills, commands, and scripts - The asset sets activated for this persona in the sandbox
  • Learnings- Whether to include compiled project learnings in the agent’s context

Personas are enabled or disabled at the project level. Disabled personas do not appear in workflow configuration or persona pipeline step pickers.

Contexts

Contexts are reusable, versioned documents that define the system prompt and CLAUDE.md content injected into agent sessions. Unlike the global sandbox system prompt (which applies to every sandbox in the project), contexts are scoped to specific session types - giving you precise control over what instructions each class of agent receives.

Session type scoping

Each context document is bound to one of the following session types:

  • learning - Sessions that extract and review learnings from completed runs
  • merging - Sessions where a merge agent integrates branches
  • issue - Issue discovery and scanning sessions
  • answering - Answering sessions that resolve questions produced by issue sessions before implementation
  • evaluating - Evaluator agent sessions that score run output quality
  • judging - Judge agent sessions that compare outputs across parallel branches

Context versioning

Every context document is versioned. When you update a context, the previous version is preserved so you can track how your instructions evolved and revert if a change degrades agent performance. The active version of a context is the one injected into new sessions of that type. Context documents are accessible at /p/[id]/context within your project.

Context vs. global system prompt

The global sandbox system prompt (in General settings) is injected into every sandbox regardless of session type. Contexts layer on top of this and are specific to a session type. Use the global prompt for project-wide rules and contexts for session-specific instructions.

Assets: Skills, Commands, and Scripts

Assets are independently versioned, publishable packages that extend agent capabilities within the sandbox. Assets are created and managed at the project level and can be selectively attached to individual personas or to specific session types via Session Configurations.

Skills

Skills are domain-specific knowledge packages. Unlike a single instruction file, a skill is composed of multiple files - for example, a Convex skill might include a patterns reference document, a code snippets file, and an architectural guidelines file. All files in a skill are written to the sandbox filesystem at session start so the agent can read, reference, and build upon them.

Skills are versioned independently. Publishing a new skill version does not affect currently running sessions - only new sessions pick up the updated files.

Commands

Commands are shell command aliases available to agents inside the sandbox. A command has a name, a shell expression, and an optional description. For example, a run-tests command might expand to npx vitest run --reporter=verbose, giving agents a stable, project-specific alias regardless of how your test runner is configured. Commands are injected into the sandbox environment when the session starts.

Scripts

Scripts are executable shell or Python scripts that can be run at specific lifecycle points in a workflow - before the agent starts, after the agent completes, or on demand. Scripts are useful for pre-seeding the sandbox with dynamic data, validating output after completion, or running post-processing steps that are too procedural for an agent.

Asset selection per persona and session type

Skills, commands, and scripts can be attached at two levels:

  • Per persona - When a persona is used in a pipeline step, its configured assets are injected into the sandbox for that step only
  • Per session type - The Session Configuration page for each session type (issue, learning, merging, answering, evaluating, judging) lets you select which assets are available for all sessions of that type

Issues and Answering Sessions

Issue sessions let you scan a codebase for problems and improvement opportunities. An issue session creates a sandbox that analyzes the repository and generates a structured list of issues, each with a title, description, priority (low, medium, high, critical), and a suggested prompt for resolution.

Answering Sessions

When an issue session produces questions or unresolved assumptions - for example, “Should this use the existing cache layer or bypass it?” - an Answering Session can be initiated to let the AI agent (or a human collaborator) resolve those questions before implementation begins. The answering session receives the list of open questions and produces structured answers that are then passed forward into the implementation run. This prevents agents from making incorrect assumptions about ambiguous requirements.

Individual issues can be executed by creating a run with the suggested prompt, or grouped into work chains that process multiple issues sequentially against the same branch.

Sprint Chains

Sprint chains are a batch orchestration mechanism for executing multiple workflow runs across a series of planned sprints. They are distinct from work chains (which chain issue fixes on a single branch) in that each sprint in a sprint chain is an independent unit of work with its own branch and pull request.

Sprint chain structure

A sprint chain tracks:

  • Sprint range- The total number of sprints defined in the chain (e.g., “sprints 1 through 8”)
  • Current sprint index - Which sprint is currently executing or last completed
  • Per-sprint PR tracking - Each sprint records its own pull request URL, branch name, and PR status independently
  • Sprint prompts- Each sprint in the chain can have its own prompt or inherit from the chain’s default

Sprint chains are well-suited for executing a roadmap of features where each feature needs independent review and merging. The chain advances through sprints automatically or on manual approval, depending on your configuration.

Recurring Tasks

Recurring tasks let you schedule any CodeCourier workflow to execute automatically on a repeating cadence. This is useful for nightly builds, weekly code quality audits, daily dependency checks, or any workflow your team wants to automate without manual triggering.

Frequency options

The supported frequency values are:

  • daily - Runs every day at the configured hour and minute
  • every_other_day - Runs on alternating days
  • weekly - Runs once per week on the configured day
  • biweekly - Runs once every two weeks
  • monthly - Runs once per month on the configured day

Timezone and scheduling

Recurring tasks store a timezone, hour, and minute of execution. The platform uses these to compute the next run time (nextRunAt) and dispatches the run automatically when the scheduled time arrives. All scheduled times are stored in UTC internally but can be configured and displayed in any IANA timezone.

Recurring task runs

Each execution of a recurring task creates a standard run record with its source set to recurring_task. This means all recurring runs appear in the Runs section alongside manually triggered runs and are subject to the same quality scoring, PR creation, and learning extraction workflows.

Evaluator and Judge Roles

CodeCourier includes two specialized agent roles for quality assessment that go beyond the standard checker pattern:

Evaluator

An evaluator agent scores the output quality of a run or run step across the six quality dimensions (see Quality Scoring below). Evaluators are useful in long pipelines where you want a dedicated quality-assessment pass after implementation and before final review. Evaluator output is structured - it produces numeric scores and a summary, not a binary pass/fail verdict.

Judge

A judge agent compares the outputs of parallel branches or multiple run attempts and selects the best one. Judges are useful when you run the same prompt against multiple configurations simultaneously (different models, different personas, different skill sets) and want an objective arbiter to decide which result to advance. The judge receives all candidate outputs and produces a structured comparison with a winner and rationale.

Quality Scoring

CodeCourier tracks output quality at the run step level through a structured quality score object. This gives teams an objective, consistent signal for how well each agent invocation performed - beyond simply “did it complete?”

Score dimensions

Each run step quality score includes six dimensions, each rated on a 0–100 scale:

  • Correctness - Does the output correctly implement the requirements?
  • Type safety - Are TypeScript types correct with no implicit any or type errors?
  • Code style - Does the code follow project conventions and style guidelines?
  • Test coverage - Are tests present, meaningful, and covering the changed code?
  • Completeness - Did the agent address all parts of the prompt?
  • Composite - A weighted aggregate of the five dimensions above

Individual run step scores are aggregated into an overall qualityScore on the run record. This enables filtering and sorting runs by quality in analytics views, and lets you identify which workflow configurations consistently produce higher-quality output.

CI Checks

Runs track CI check status through a ciChecks object that is updated as your CI pipeline processes agent-generated code. The object contains:

  • status - Overall CI status: pending, running, passing, failing, or skipped
  • checks - An array of individual check results, each with a name, status, and optional details URL
  • checkedAt - Timestamp of the most recent CI status poll

CI check data appears on the run detail page so you can assess code quality without switching to your CI provider’s interface.

Learnings

Learnings are the knowledge management system in CodeCourier. Every time an AI agent makes a mistake, discovers a pattern, or encounters a project-specific requirement, that knowledge can be captured as a structured learning record.

Learning structure

Each learning contains:

  • Description - What was learned
  • Trigger - What situation triggers this learning
  • Correct behavior - What the agent should do when the trigger occurs
  • Severity - Critical, important, or minor
  • Category - Preference, pattern, gotcha, tool, or architecture
  • Confidence - A numeric score indicating how reliable this learning is
  • Source - Whether it was created by an agent during a run or extracted from a session after the fact

Learning lifecycle

Learnings go through a three-stage review process:

  1. Pending - Newly created, awaiting human review
  2. Approved - Verified by a team member and included in future sessions
  3. Rejected - Dismissed as incorrect or not useful

Learning versions

Approved learnings are compiled into versioned markdown documents called learning versions. Each version is scoped to a project and role type, contains the compiled markdown, and references which individual learning records are included. When a new sandbox is provisioned, the active learning version is injected into the agent’s context automatically.

Continuous improvement

The learning system creates a feedback loop: agents produce work, learnings are extracted from that work, approved learnings improve future agent behavior, and the cycle repeats. Over time, your project accumulates institutional knowledge that makes every subsequent run more effective.

The real-time data model

CodeCourier uses Convex as its database and backend runtime. This means every query is a reactive subscription: when data changes on the server, every connected client that is reading that data updates instantly, with no polling or manual refresh required.

This architecture has important implications for the CodeCourier experience:

  • Run status updates (pending to running to completed) appear immediately across all team members’ browsers
  • Sandbox messages stream in real time as the agent produces them
  • Quality scores and CI check status update live as evaluators complete their assessments and CI pipelines report back
  • New learnings, workflow changes, and team member additions propagate instantly
  • Dashboard counters and analytics update live without page refreshes

All business logic - authentication, authorization, validation, and data mutations - runs in Convex server functions. The frontend never talks to the database directly; it calls Convex queries (which subscribe reactively) and mutations (which modify data through validated server-side functions).

How concepts connect

Here is how the major concepts relate to each other:

  • A project contains workflows, personas, contexts, assets, plans, issues, learnings, recurring tasks, sprint chains, and team members
  • A workflow references a project and defines a blueprint for runs
  • A persona belongs to a project and carries its own set of assets (skills, commands, scripts) and context binding; it can be referenced by persona pipeline workflow steps
  • A context belongs to a project, is scoped to a session type, and is injected automatically into sandboxes of that type
  • Assets (skills, commands, scripts) belong to a project and are selected per persona and per session type
  • A run belongs to a project, optionally references a workflow, creates one or more sandboxes, and tracks quality scores and CI check status
  • A sandbox belongs to a run (or exists standalone) and contains messages
  • An issue session belongs to a project and produces issues that can trigger runs, work chains, or answering sessions
  • A sprint chain belongs to a project and orchestrates a series of runs across planned sprints, each with its own branch and PR
  • A recurring task belongs to a project and dispatches runs on a configurable schedule
  • Learnings are extracted from sandboxes, reviewed by team members, compiled into versions, and injected into future sandboxes

Next steps