Welcome to the CodeCourier Q1 2026 release notes. Ninety days, 47 shipped changes, three deprecations, two things we pulled. This is the definitive AI agent product update for the quarter - every meaningful workflow builder update, every persona forking improvement, every sprint chain AI capability we promised in the Q4 2025 wrap-up, plus a long tail of platform work that doesn't make the homepage but moves the numbers. Read it top to bottom, or jump to the section that matters to your team.
Q1 was our biggest quarter by raw output and our most boring by narrative. We did not pivot. We did not rename anything. We shipped the roadmap we published in January, on schedule, and the durable agent memory updates landed without a single Sev-1. That is the report.
1. TL;DR - the top 10 ships at a glance
If you read nothing else, read this. These are the ten changes most likely to alter how your team uses CodeCourier this week.
- Workflow Builder v2 - branching, conditional steps, and per-node retry policies. Authoring time down 51% in customer telemetry.
- Persona Forking - git-style versioning for every persona, with atomic promote and side-by-side A/B evaluation.
- Sprint Chains GA - chain up to 75 issue sessions across a sprint with dependency ordering and pause-on-review gates.
- Contexts v3 retrieval - 38% better recall@10 on our internal eval set; new project-scoped and persona-scoped indexes.
- Async Issue Sessions - fire from GitHub, Linear, or Jira and walk away; the agent files a PR when ready.
- Sandbox cold-starts at 220 ms - down from 380 ms (median), with new GPU runtimes (L4, A10G) in private preview.
- SOC 2 Type II complete - reports available to enterprise prospects under NDA, EU data residency live in Frankfurt.
- Audit log export - stream every agent action to Splunk, Datadog, or any SIEM via S3-compatible sink.
- Nine new integrations - Linear, GitHub Issues, Jira, Slack, Sentry, PagerDuty, Notion, Vercel, and 1Password.
- Replay timeline + cost view - scrub any run by time and see per-run, per-workflow, per-project spend in one click.
The rest of this post unpacks each of those, links to the product page where it lives, and ends with what we shelved and what's coming in Q2.
2. Workflow Builder v2 - branching, conditions, retries
Workflow Builder got the most attention of any single surface this quarter. We threw away the old node editor and rebuilt it from scratch around three things customers had been asking for since launch: conditional branching, declarative error retry, and a debugging story that doesn't require reading JSON.
What we shipped
- Conditional steps. Every node can declare a guard expression. If the guard evaluates false, the step is skipped and the workflow continues. Guards are written in a tiny typed DSL - no Turing-completeness, no remote code execution surface.
- Branching. A node can have multiple downstream edges with mutually exclusive conditions. The first matching edge wins. This kills the "dispatch by string switch" pattern that plagued v1 workflows.
- Per-node retry policies. Configure attempts, backoff (constant, linear, exponential), jitter, and which error classes trigger retry. Default is three attempts with exponential backoff for transient tool errors and zero retries for assertion failures.
- Live context preview. Hover any step in the editor to see exactly which Contexts will be loaded, how many tokens they consume, and whether the budget fits the chosen model.
- Failure surface that explains itself. When a run fails mid-workflow, the dashboard shows the failing node, the inputs it saw, the tool calls it made, and a one-click "re-run from here" button.
Why it matters
Imagine a workflow that lints, runs unit tests, runs integration tests, and only opens a PR if all three pass - but if integration tests fail with a known flaky-test signature, it retries twice before giving up. In v1 you wrote that as four chained workflows with bespoke glue. In v2 it's one workflow with three guards and one retry policy. The visual editor exports to TypeScript-typed JSON, so it lives in your repo, reviews in PRs, and rolls back like code.
How to use it
Open the Workflow Builder, click New workflow, and start dropping nodes. The right-hand inspector now has tabs for Inputs, Guards, Retries, and Contexts. Existing v1 workflows auto-migrate; we tested the migration on 1,200+ customer workflows and saw zero behavioural diffs. If your migration produces a warning, the inspector shows the exact line and a suggested fix.
3. Persona Forking - git-style versioning for agents
Personas grew up this quarter. Persona forking means every persona now has a full history with branches, diffs, and atomic promotion. You can fork a persona, change its system prompt or its tool allowlist, run a head-to-head A/B against your golden task set, and promote the winner in one click. Old sessions remain replayable against the version that produced them - no more "why did this PR look different in February?"
What it is, concretely
- Each persona has a
mainbranch and unlimited forks. - Forks can edit prompt, tools, model, temperature, context scope, and retry behaviour.
- A/B runs execute the same task set against two persona versions in parallel and produce a scorecard (pass rate, cost per task, p95 latency, reviewer-rated quality if you wire up the eval hook).
- Promotion is atomic. All new sessions use the promoted version starting at the next tick. In-flight sessions complete on their original version.
Why it matters
Tuning an agent persona is iterative. You change a system prompt, ship it, regret it, roll back. Without versioning, "roll back" means re-typing what you had last week from memory. With forking, it's a click. The non-obvious win: customers are running multiple persona variants in production for different parts of a codebase - strict-review persona on auth code, fast-and-loose persona on internal scripts - and the version graph is how they keep track.
Example
One Series B customer maintains four forks of their backend persona: main (production), strict-types (enforces stricter TypeScript), perf-mode (adds a benchmarking step), and experimental (tries a newer model). They promote strict-types into main every two weeks if the eval scorecard beats main by more than 3% on quality and ties on cost. That cadence is now part of their engineering rituals.
4. Sprint Chains - chaining issue sessions across a sprint
Sprint Chains went generally available on February 18. Feed CodeCourier a project plan - usually a markdown document describing a multi-issue body of work - and it decomposes into an ordered sequence of Issue Sessions with dependencies respected, state passed forward, and human-review gates honored. Pause when a step opens a PR, resume after merge.
What changed in Q1
- Max chain length raised to 75 issues (was 20 in beta). Two customers have run 50+ issue chains end-to-end with no intervention.
- Dependency parser. The chain planner now reads your markdown plan, detects blocks / depends-on references, and builds a real DAG.
- Mid-chain rollback. If issue 14 of 30 fails, you can roll back issues 11–14 and resume from 11 without losing the first 10.
- Chain-level cost cap. Set a budget; the chain pauses for approval if it threatens to blow through it.
Why it matters
Most real engineering work is not one issue. It's a sprint - five to twenty connected pieces. Sprint Chains let an agent operate at that scope without losing the thread halfway through. The longest chain we've seen run successfully in production was 67 issues, 11 hours of cumulative agent time, 23 PRs merged. That customer described it as "a junior dev who never forgets the plan."
5. Contexts upgrades - retrieval, scoping, evals
Contexts is how CodeCourier loads the right code, docs, and conventions into a session without blowing the token budget. Q1 brought three durable agent memory updates: better retrieval, tighter scoping, and a real evaluation framework.
Retrieval
We rebuilt the hybrid retriever. BM25 + dense embeddings + a small reranker trained on customer-labelled relevance pairs. Recall@10 on our internal eval set improved 38%. p95 retrieval latency dropped from 410 ms to 240 ms despite the reranker, because we moved the index off generic disk onto NVMe and tightened the fan-out.
Scoping
Contexts can now be scoped at three levels: org, project, and persona. A persona-scoped Context only loads for sessions launched by that persona. A project-scoped Context is shared across personas working in the same repo. This kills the old failure mode where a security-team Context leaked into a frontend session and pushed the model toward paranoid suggestions in CSS.
Eval framework
Every Context now has an attached eval set. You define golden retrievals - "for query X, document Y should appear in the top 5" - and CI runs them on every Context change. The dashboard shows pass rate over time, so a Context that silently regresses after a docs refresh gets caught before it ships. This is the single most-requested feature from the November 2025 customer survey.
6. Issue Sessions - new triggers and async mode
Issue Sessions picked up three new triggers and a fundamentally different execution mode.
- GitHub Issues trigger. Label an issue with
codecourier:run(configurable) and a session spawns. Persona is chosen by label routing, e.g.persona:backend. - Linear trigger. Native two-way sync. Status updates flow back to Linear so PMs see progress without opening our dashboard.
- Jira trigger. The one customers begged for. Same shape as Linear: label or transition triggers, two-way status sync.
- Async mode. Fire an Issue Session, get a session ID, and walk away. The agent opens a PR (or asks a clarifying question on the issue) when it's done. No long-lived websocket, no "is it still running" tab to babysit.
Why async matters
In our telemetry, the median Issue Session takes 17 minutes and the p95 takes 73. Asking humans to sit in a tab for 73 minutes is a non-starter. Async mode means a TPM can file 12 issues on Monday morning, go to standup, come back, and triage the resulting PRs. Total wall-clock time on the human side: maybe 30 minutes of triage for 12 issues of work.
7. Sandboxes - cold-starts, runtimes, GPUs
Sandboxes are the isolated VMs where every agent action runs. Q1 was a performance and breadth quarter.
| Metric | Q4 2025 | Q1 2026 | Change |
|---|---|---|---|
| Median cold-start (ms) | 380 | 220 | -42% |
| p95 cold-start (ms) | 1,180 | 640 | -46% |
| Parallel sandboxes (Pro) | 8 | 24 | +200% |
| Available regions | 3 | 5 | +2 (Frankfurt, Tokyo) |
| Available runtimes | 9 | 14 | +5 (incl. GPU) |
New runtimes include Python 3.13, Node 22 LTS, Bun 1.2, Deno 2.1, and two GPU templates (L4 and A10G) in private preview for customers running on-sandbox model inference, image work, or ML training jobs. Filesystem snapshots are now incremental, so a sandbox cloned from a warm template starts in 60–90 ms in our hottest region.
8. Security and compliance - SOC 2, GDPR, audit logs
Compliance is a feature for everyone who has to fill out a procurement questionnaire. We invested in it.
- SOC 2 Type II audit completed in February. Report available under NDA via /soc2.
- EU data residency in Frankfurt. Set it on the project and every sandbox, session log, and Context index for that project stays in-region. Details at /gdpr and our top-level security page.
- Audit log export. Every agent action - tool call, file write, PR opened, secret read - streams to your SIEM. Built-in formatters for Splunk and Datadog; everyone else gets newline JSON to an S3-compatible sink.
- Customer-managed keys for at-rest encryption on Enterprise. Bring your own KMS.
- Self-hosted sandbox runners in private preview. Run our orchestrator against sandboxes you own, in your VPC.
We are not a compliance company. We are a product company that treats compliance as table stakes. SOC 2 Type II is the floor, not the ceiling.
9. Integrations - nine new, named
AI dev integrations are how CodeCourier becomes part of how your team already works. Q1 added nine, with one short note each.
- Linear. Native two-way sync, label routing, status mirroring.
- GitHub Issues. Label-triggered sessions, PR back-references, branch protection-aware merging.
- Jira. Two-way sync; supports both Cloud and Data Center.
- Slack. Run-status notifications, slash commands, channel-scoped persona allowlists.
- Sentry. Exception-to-session: take a Sentry issue, spawn a CodeCourier session pre-loaded with stack trace and recent related errors.
- PagerDuty. On-call paging when a chain hits a fatal error mid-run and no human is online.
- Notion. Pull plan documents from Notion as Sprint Chain inputs; push run summaries back as comments.
- Vercel. Preview-deploy-aware sessions; the agent can read your preview URL and assert against it before opening a PR.
- 1Password. Secrets sourced at runtime; nothing ever lands in our database or in a sandbox env file.
10. Smaller wins - the long tail
Twenty-two minor improvements that make the product feel less like a beta. In no particular order:
- Keyboard shortcuts everywhere;
?shows the cheat sheet. - Dark mode for the dashboard, persisted per-user.
- Better empty states with a one-click "create example" flow.
- Faster project switching (cmd-K, fuzzy match, 12 ms p95).
- Copy-link buttons on every shareable resource.
- Mobile-friendly run monitoring view.
- Clickable stack traces in run logs.
- One-click re-run from any past session.
- Persistent filter state in the issues view.
- Quieter notifications for runs you started yourself.
- Workflow imports from a public URL.
- Per-persona temperature override.
- Inline diff view on PR-opening steps.
- Bulk archive for old sessions.
- Cost view filterable by tag.
- Webhook retries with exponential backoff.
- API rate-limit headers documented and consistent.
- Sentry breadcrumbs include the workflow node name.
- OAuth app review-ready for Slack and Linear marketplaces.
- Status page now reflects per-region health, not just global.
- Public analytics page for platform-wide success rate and latency.
- Public guides hub with 14 new walkthroughs.
11. What we shelved - honest mode
We tried things that did not work. Calling them out keeps us honest.
- Voice-driven sessions. We prototyped a voice interface for kicking off Issue Sessions. Latency was fine, accuracy on technical jargon was not. Pulled the beta in February. Will revisit when on-device transcription improves.
- Cross-org persona marketplace. We thought customers would want to share personas publicly. The closed beta had 14 sign-ups and three shared personas after a month. We shelved it. Personas, it turns out, are deeply tied to a codebase's culture and don't travel well.
- In-product video walkthroughs. We added 90-second videos to the empty states. Customers told us they were annoying. We pulled them and put the budget into the guides hub.
12. What's next in Q2
Two themes for Q2 2026:
- Review agents. Personas purpose-built for PR review, with a memory of your team's standards. Targeting private preview in May.
- Multi-agent collaboration. Structured handoff between specialist personas - planner to coder to reviewer to critic - inside a single sandbox, with shared Contexts and a common audit trail.
Plus more of the same boring excellence: faster sandboxes, better Contexts retrieval, more integrations. The unsexy compounding wins are what make a product feel inevitable.
FAQ
How do I upgrade to Workflow Builder v2?
You don't need to. New workflows default to v2; old workflows auto-migrate the first time you open them in the editor. There is no breaking change to the runtime - v1-shaped workflows continue to execute identically.
Is persona forking available on all tiers?
Yes. Free, Pro, and Enterprise all get unlimited forks and atomic promote. A/B evaluation runs count toward your monthly session quota on Free and Pro; Enterprise is uncapped.
What's the max length of a Sprint Chain?
75 issues at the moment. We have not seen a real customer plan exceed that, but if you have one, please get in touch - we'd like to test against it.
Do you offer EU-only deployments?
Yes. Set the project region to Frankfurt and every sandbox, session log, and Context index lives in-region. SOC 2 Type II audit covers the EU plane. See /gdpr for the full data-flow diagram.
How do I export my audit logs?
Settings → Compliance → Audit Log Export. Choose Splunk, Datadog, or generic S3-compatible sink. Logs stream within seconds of each agent action.
What happened to the old REST endpoint for triggering runs?
Removed on March 15, 2026, after a 90-day deprecation window announced in December 2025. Use the new endpoint documented in the API reference. Migration is one line.
Where do I report a bug or request a feature?
Easiest path: contact us, or open a public issue on our roadmap. Every Q1 release in this post traces back to a customer request from the previous 6 months. The roadmap is yours.
Thanks for reading. If this was useful, the rest of our writing lives on the blog, and the product itself is at codecourier. See you at the end of Q2.