If you let an AI agent execute code on shared infrastructure, you have already lost. That is the uncomfortable conclusion we reached after a year of running production AI workloads at CodeCourier. The model is not malicious; the inputs are. Prompt injection, supply-chain poisoning, runaway processes, and quietly corrupted caches will find every seam in a non-isolated environment. The only credible answer is an AI agent sandbox: a fresh, hardware-isolated Linux machine that boots in under a second, accepts the agent's instructions, and is shredded the moment the run ends.
This post is the technical pillar we have been promising. It covers the threat model, the comparison between containers, microVMs, and full VMs, how we provision E2B-backed Firecracker microVMs in sub-second time, how we keep filesystems and networks tight, how secrets never persist, how we record every action for audit, and the seven concrete lessons we learned the hard way. If you are building agents that touch code, this is the substrate question that decides whether you can sleep on Friday night.
Why sandboxes are non-negotiable for AI engineers
An AI agent sandbox is an ephemeral, isolated execution environment - typically an ephemeral VM or microVM - that an autonomous agent uses to run code, commands, and tools without access to host resources, other tenants, or persistent state. It is the difference between an agent that drafts a pull request and an agent that ships a CVE into your supply chain.
Three properties make a sandbox a sandbox, not just a container with good intentions:
- Hardware isolation. The boundary is enforced by the hypervisor, not by the kernel namespaces it runs alongside. A container escape becomes a microVM escape - orders of magnitude harder.
- Ephemerality. The disk, RAM, and network state vanish when the run ends. No caches, no leftover sockets, no temp files with bearer tokens.
- Sub-second provisioning. If the cold start is slow, engineers route around it, sharing sandboxes between runs. The instant that happens, isolation is gone.
Skip any of the three and you do not have a sandbox. You have a container with a marketing budget.
The threat model - what goes wrong without isolation
Every team I talk to underestimates the attack surface of an autonomous agent. The model is non-deterministic, the inputs are attacker-controlled, and the tools are powerful by design. Here is the threat catalog we built our architecture against.
1. Prompt injection containment failure
An issue ticket, a README, a comment in a dependency - any text the agent ingests is an instruction it might follow. We have seen agents convinced to git push to an attacker-controlled remote, exfiltrate .env files via a curl one-liner, or install a package whose post-install script writes to ~/.ssh/authorized_keys. A sandbox does not stop the prompt injection. It contains the blast radius to a disk that will be destroyed in fifteen minutes.
2. Supply-chain poisoning
Every npm install runs arbitrary code. Every pip install too. A typosquatted package, a compromised maintainer, a malicious post-install hook - your agent will gleefully execute all of them. Without a sandbox, the host gets the backdoor. With a sandbox, the throwaway VM does, and nobody cares.
3. Exfiltration via DNS, HTTP, or git
A determined exfiltration channel is one curl -d @secrets.txt away. On a shared host, that traffic blends in with legitimate traffic. In a sandbox with a deny-by-default egress allowlist, the request fails and we log it.
4. Runaway processes and resource exhaustion
An agent retrying a flaky test in a while true loop will eat every CPU cycle it can reach. Without per-VM quotas, one bad run takes down the host and everyone on it. With them, the VM hits its ceiling, the agent gets the OOM signal, and the rest of the platform never notices.
5. Cross-tenant data leakage
If two customers share a process, a kernel, or even a filesystem, you have built a side channel. Speculative execution attacks, /proc traversal, shared temp directories - the history of multi-tenant security is the history of finding new versions of this bug. A microVM with its own kernel removes the category.
Container vs microVM vs full VM - the comparison that matters
I keep getting asked why we did not just use Docker. Here is the table I send back.
| Property | Container (Docker) | microVM (Firecracker / E2B) | Full VM (KVM, EC2) |
|---|---|---|---|
| Isolation boundary | Kernel namespaces, cgroups | Hypervisor, dedicated kernel | Hypervisor, dedicated kernel |
| Cold start | 100 ms - 2 s | 150 ms - 400 ms | 20 s - 90 s |
| Memory overhead | ~5 MB | ~5 MB | 100 - 500 MB |
| Density per host | Hundreds | Thousands | Tens |
| Kernel exploit resistance | Low - shared kernel | High - separate kernel per VM | High - separate kernel per VM |
| Filesystem isolation | Layered, often shared layers | Block device per VM | Block device per VM |
| Fit for untrusted code | Risky | Designed for it | Yes, but slow |
Containers win on tooling and ecosystem. Full VMs win nothing for our use case - they are too slow to provision per request. Firecracker microVMs hit the only spot on the curve that matters for AI agents: hardware-grade isolation at container-grade startup. That is the technical reason E2B exists and the reason we bet on it.
How CodeCourier provisions sandboxes in under a second
The architecture is built around four primitives: a pre-warmed pool, snapshot restoration, template images, and a thin orchestration layer between our control plane and the E2B API. Here is the path a sandbox takes from request to ready.
- Request lands at the sandbox manager. The workflow runner asks for a sandbox with a template (e.g.
node20-postgres), a region, a CPU/RAM shape, and an egress policy. - Pool lookup. We keep a warm pool of paused microVMs per template per region. If a match exists, we skip the boot entirely.
- Snapshot restore. A matched VM is resumed from a memory snapshot in roughly 120 - 180 ms. The kernel, the package manager, and the language runtime are already in RAM.
- Cold path. If the pool is empty, we boot a fresh microVM from the base template image. Firecracker reports ready in 150 - 250 ms; total time including our orchestration overhead lands at 300 - 600 ms.
- Inject secrets and policy. Short-lived per-step credentials are handed to the agent via a memory-only init channel. Egress rules are applied at the host network namespace before the agent runs its first command.
- Hand the handle to the workflow. The orchestrator returns an opaque sandbox ID. The agent starts working.
End-to-end median, measured over the last 30 days of production: 520 ms. p99: 1.4 s. Cold path (no pool hit): 1.8 s p50. That is the number that lets the user-facing dashboard say "starting sandbox" and not show a spinner.
A representative snippet from our orchestrator (TypeScript, simplified):
import { Sandbox } from 'e2b';
async function provision(opts: {
template: string;
region: 'us-east' | 'eu-west';
egressAllowlist: string[];
stepSecrets: Record<string, string>;
}) {
const warm = await pool.tryClaim(opts.template, opts.region);
const sbx = warm ?? await Sandbox.create(opts.template, {
region: opts.region,
timeoutMs: 15 * 60 * 1000,
});
await applyEgressPolicy(sbx.id, opts.egressAllowlist);
await injectSecrets(sbx.id, opts.stepSecrets); // memory-only
return sbx;
}
The full machinery - pool sizing, eviction, snapshot generations, region failover - lives behind that surface. See how we run sandboxes for the product-side view.
Filesystem and network isolation in practice
Isolation is a property you have to enforce at every layer or you have none. We treat the filesystem and the network as the two primary blast doors.
Filesystem. Every sandbox gets its own block device backed by a copy of the template image. There is no overlay shared with the host. The agent has root inside the VM and exactly nothing outside it. Writes do not persist across runs unless the workflow explicitly pushes artifacts to object storage or to git. When the run ends, the block device is wiped before the slot is returned to the pool.
Network. The default policy is deny-all egress with a curated allowlist: the customer's git host, common package registries (npmjs.org, pypi.org, crates.io), and the LLM endpoint. Everything else is dropped at the host firewall. Customers can extend the allowlist per project, but never per run, and never via the agent itself - that would let prompt injection rewrite its own walls.
DNS is resolved through our own resolver that logs every lookup. A surprising number of exfiltration attempts show up as DNS queries to attacker-owned domains long before any HTTP request fires. We catch them there.
Secrets handling - never leak, never persist
Every secrets bug we have ever shipped came from a secret that lived longer than it needed to. Our rule is: short-lived, narrowly scoped, never on disk, never in logs. Specifically:
- Per-step tokens, not per-run. A workflow with twelve steps mints twelve distinct credentials, each valid for the duration of that step plus a small grace window.
- Memory-only injection. Secrets are passed through a tmpfs-backed env file that unmounts on process exit. They never touch persistent storage.
- Log scrubbing at the streaming layer. Stdout and stderr pass through a redactor before they hit our database, the dashboard, or any human eyes.
- No agent-readable secret store. The agent cannot enumerate available secrets. It can only use the ones the workflow explicitly bound to the current step.
- Automatic revocation on anomaly. If our audit pipeline sees a credential used outside its declared scope, it is revoked within seconds.
Compliance documents this in detail - see our SOC 2 and GDPR pages - but the engineering substance is the five bullets above.
Observability - recording every agent action for audit
An autonomous agent without a tape recorder is a liability. We stream every meaningful event out of the sandbox in real time and persist it to immutable storage:
- Every command executed, with arguments, working directory, exit code, and duration.
- Every file written or read above a configurable size threshold, with a content hash.
- Every outbound network connection, the resolved IP, the destination port, and the number of bytes transferred.
- Every LLM call made from inside the sandbox, including model, token counts, and the tool calls the model emitted.
- Every secret materialization event, scrubbed of the secret value itself.
The result is a forensic timeline you can replay in our dashboard or pipe into your SIEM. When an enterprise security team asks "what did the agent actually do at 14:32 on March 4?", we answer in seconds, not days. The same pipeline feeds our nightly red-team - see the security overview for the adversarial setup.
Lessons from a year running AI in sandboxes
Reading about isolation is one thing. Running it on real workloads is another. Here are the seven lessons we actually paid for, with the numbers attached.
Lesson 1 - shared caches are a security incident waiting to happen
Our first prototype shared an npm cache across sandboxes to shave 8 - 12 seconds off installs. It worked beautifully until an agent installed a typosquatted package whose post-install script wrote a backdoor into the cache directory. The next twenty runs picked it up. We removed the shared cache the same day. The fix was to bake the top 200 packages into the template image - that recovered most of the speedup without the shared-state risk.
Lesson 2 - pre-warmed pools beat optimized cold starts
We spent two weeks shaving 80 ms off the Firecracker cold path. Then we built a warm pool and got 280 ms back for free on every cache hit. Lesson: do not optimize the slow path before you have eliminated the slow path. Today, 87% of our sandbox provisions hit the pool. Median provision time dropped from 1.4 s to 520 ms in a week.
Lesson 3 - long-lived tokens are an own goal
We used to mint a single git-push token per workflow run, valid for the whole run. It was convenient. It was also one prompt injection away from disaster. We rebuilt the credentials layer to mint per-step tokens with a 60-second TTL plus an explicit renewal handshake. Our worst-case credential exposure window dropped from 40 minutes to under 2.
Lesson 4 - observability pays for itself in week three
The first time a customer asked why their workflow had taken 14 minutes longer than usual, we replayed the timeline, found a flaky npm install retrying against a slow mirror, and patched the template the same afternoon. That single incident justified the entire observability stack. Three months in, we use those timelines as a debugging tool more than a security one.
Lesson 5 - fair-share scheduling stops noisy neighbors
One customer ran a workflow that spawned 100 parallel test suites in a single sandbox. The sandbox was fine - but the host's I/O budget was not. After we added per-sandbox I/O quotas and per-tenant concurrency caps with a fair-share scheduler, the p99 latency for everyone else dropped by 38% and we never saw the problem again.
Lesson 6 - snapshots are a phase-one feature, not phase two
We treated sandbox snapshots as a nice-to-have for the first six months. Customers running multi-hour workflows ran into rate limits, transient model outages, or flaky tests and had to start from scratch every time. When we shipped snapshotting, average successful-run cost dropped 23% and customer-reported "workflow died for no reason" tickets went to zero in two weeks.
Lesson 7 - the audit log is the product
Every enterprise sales cycle eventually reaches the question "prove the agent did exactly what it claims it did." We thought the answer was the result. The answer is the timeline. Treat your audit log as a first-class user interface, not a compliance artifact. Our deal velocity roughly doubled after we shipped the timeline view in our dashboard.
When NOT to use sandboxes
I am not religious about this. There are real cases where a sandbox is overkill or even counterproductive:
- Pure inference, no tool use. If the agent only reads from an API and writes a Markdown summary, you do not need a microVM. You need a stateless function.
- Highly latency-sensitive interactive UX where you control every input. A code-completion model running on a developer's own machine, with no untrusted input in the loop, is its own trust boundary.
- Deeply integrated IDE plugins. If the agent is by definition acting as the developer, sandboxing it from the developer's repo defeats the purpose. Sandbox the network, not the filesystem.
The line we draw at CodeCourier: any agent that executes code on behalf of a user other than themselves runs in a sandbox. Everything else is case-by-case. If you are running Issue Sessions, Personas, or the Workflow Builder, you are on the sandbox side of the line.
FAQ - AI agent sandboxes
What is an AI agent sandbox?
An AI agent sandbox is an ephemeral, hardware-isolated execution environment - usually a microVM - that an autonomous agent uses to run code, install packages, execute commands, and use tools. It is destroyed when the run ends, so nothing the agent does persists unless the workflow explicitly exports it.
Why is a microVM better than a Docker container for AI agents?
A Docker container shares the host kernel. A kernel exploit, a misconfigured capability, or a namespace escape gives an attacker the host. A microVM runs its own kernel under a hypervisor; an escape requires breaking the hypervisor, which is a much harder target. For untrusted code - and all LLM-generated code is, by definition, untrusted - microVMs are the conservative choice.
How does E2B achieve sub-second sandbox startup?
E2B uses Firecracker microVMs, which boot in 100 - 250 ms because they strip out every device, BIOS routine, and kernel module a serverless workload does not need. On top of that, we keep a pre-warmed pool of paused VMs per template. A pool hit restores from a memory snapshot in roughly 120 - 180 ms.
Can prompt injection escape a sandbox?
Prompt injection cannot escape the sandbox boundary itself - that is enforced by the hypervisor, not the model. What prompt injection can do is convince the agent to misuse the resources it legitimately has inside the sandbox: exfiltrate via an allowlisted endpoint, push to the wrong git remote, or burn credit. The defense is layered: strict egress allowlists, per-step credentials with narrow scopes, and observability that flags anomalous tool calls in real time.
How long does a sandbox live?
By default, fifteen minutes of idle time before automatic destruction, with an absolute maximum configurable per workflow. Long workflows use our snapshot protocol to pause and resume across the lifetime of a single run.
Where do customer artifacts go after the sandbox is destroyed?
Wherever the workflow explicitly puts them. Branches push to your git host. Artifacts upload to your object storage. Nothing else survives. The block device is wiped before the slot returns to the pool.
Is the AI agent sandbox SOC 2 and GDPR compliant?
Yes. CodeCourier is SOC 2 Type II and runs an EU-only data plane for GDPR-sensitive workloads. See SOC 2 and GDPR for the full controls.
Where do I learn more?
Start with our guides, browse the engineering blog, or read about the team. If you want to talk to us about a specific workload, reach out.