Skip to main content
April 16, 2026

A Big Issue in Agent Infrastructure

One of the biggest problems in agent infrastructure right now is that very different execution environments are being marketed with very similar security language. “Secure sandbox.” It sounds precise. It isn’t. And the cost of that ambiguity is real. Teams are deploying agents against production systems based on marketing language. When the boundary those agents run inside is weaker than expected, anything within the agent’s reach, including secrets, customer data, connected systems, and infrastructure, can be exposed.

Why “Secure Sandbox” is Becoming a Meaningless Term in Agent Infrastructure

When people say “sandbox,” they can mean fundamentally different things:
  • Same-host sandboxing (e.g. language isolates like V8 isolates, WebAssembly, or in-process runtimes). These run inside the host process. The code shares an address space or at a minimum shares the host kernel. A bug, a bad dependency install, or an agent misbehaving can compromise the entire process. There is no separate kernel. There is no VM boundary.
  • Container isolation with policy controls (e.g. namespaces, cgroups, seccomp filters). Better resource controls and filesystem restrictions, but still a shared host kernel. A container escape is a host escape. Every tenant on that host may be exposed.
  • Per-tenant VM or microVM environments (e.g. Firecracker, Cloud Hypervisor). Each tenant gets its own kernel. Syscalls land inside the guest, not on the host. The attack surface shrinks to a small, heavily audited hypervisor and device model. Shared-memory interfaces between guest and VMM remain part of the attack surface.
  • Per-tenant VM or microVM with hardware isolation (e.g. VFIO passthrough with IOMMU enforcement). Direct hardware access with memory isolation enforced at the hardware level. The guest interacts with the device through native drivers, not a virtualized interface. Cross-tenant memory access is blocked by the IOMMU. Escape requires a hypervisor-level bug.
  • Trusted Execution Environments (TEE / confidential computing). Hardware-encrypted memory with remote attestation. Even the infrastructure operator cannot inspect the workload at runtime.
These are not points on a continuum. They are categorically different trust models. They provide different isolation guarantees, different threat models, and very different blast-radius characteristics. But today, they are increasingly being described with the same language.

Why This Matters for Agents

Traditional serverless was designed for trusted web requests: deterministic code, written by known developers, running well-understood logic. Agents are different. They introduce autonomous decision making and dynamic execution of untrusted actions. The code being executed is generated at runtime, often from external inputs, and cannot be fully predicted ahead of time. Many agent tasks involve code execution under the hood, even when they do not look like coding on the surface. Data analysis, tool use, file manipulation, browser automation — these can all result in dynamic code running against real systems. Without a sandbox — or when something goes wrong in a one that isn’t hardware-isolated — agent actions run with the same access as your application. Secrets, customer data, and connected systems all become reachable. Not all of these actions carry the same risk. They break into distinct classes:
  • Low risk — read-only, low-privilege, and easy to reverse.
  • Medium risk — touches real systems through narrow, predefined, allowlisted paths.
  • High risk — allows arbitrary or unpredictable execution, broad permissions, or failure modes that can materially impact the host, connected systems, secrets, customer data, or costs.
Different risk classes require different execution environments and different layers of defense. This confusion persists even among technically sophisticated teams, because many are evaluating agent execution through the lens of trusted developer code. But dynamic, untrusted agent execution is a fundamentally different problem. The boundary that works for trusted code is not necessarily sufficient for autonomous workloads generating actions from external inputs.

The Source of Confusion

The confusion starts when all of these environments get flattened into a single “secure agent sandbox” narrative. Multiple recent launches (from popular and “trusted” providers) have described their systems as “secure,” “isolated,” and “sandboxed” — without clearly stating what the actual execution boundary is. In some cases, platforms marketed as secure sandboxes for running agents are, according to their own public documentation, alpha-stage, or actively building toward stronger isolation. In other cases, the underlying boundary turns out to be container-based V8 isolates, or same-host sandboxes — which may be acceptable for lightweight serverless workloads, but are not a sufficient execution boundary for many agent tasks involving untrusted code, sensitive systems, or real-world side effects. This creates a gap between how the system is perceived and how the system is actually implemented. When developers hear “secure sandbox,” many will assume a stronger boundary than what is explicitly documented. And a lot of the current market is collapsing very different risk classes into one “agent tool use” bucket.

Controls Are Not the Same as Containment

Another common source of confusion: runtime controls are often presented as if they solve the same problem as isolation. They don’t. Allow/deny prompts, network controls, filesystem restrictions, runtime policies — these are important. But they are not a substitute for a strong execution boundary. They operate within a boundary. They do not define the boundary itself. The boundary caps the physical damage through resource limits and timeouts. Runtime controls catch the behavior earlier — tool-call budgets, policy gates, loop breakers — before a misfiring agent turns into a self-inflicted DoS, a noisy-neighbor on shared compute, or a runaway cost event. Neither truly stops the loop. Both contain it. Controls limit the damage a bad decision can cause. They do not make an agent’s reasoning correct, and they do not replace a strong execution boundary. The actual answer is both: a strong isolation boundary for containment, and runtime controls for behavior. They solve different problems.

The Missing Piece: Execution Boundary Clarity

If a platform is going to be used for agent execution, the most important question is: What is the execution boundary? Specifically: Is this a same-host sandbox? Is this container-based isolation? Is there a per-tenant VM or microVM? Is there hardware-level isolation? And the required answer depends on the risk class:
  • For low-risk actions, same-host sandboxing with resource limits and timeouts may be acceptable.
  • For medium-risk actions, runtime controls with narrow interfaces and stronger isolation are needed.
  • For high-risk actions — arbitrary execution, credentials, customer data, production writes — the answer should be a hardware-isolated VM or microVM with its own kernel, paired with runtime controls. Not one or the other. Both.
Without that clarity, “secure sandbox” is not a meaningful description.

What the Market Needs

The industry doesn’t need more “secure agent” messaging. It needs clearer definitions, explicit boundaries, and alignment between risk and containment. This is becoming more urgent, not less. Anthropic’s recent research reports that among the longest-running sessions, the length of time Claude Code works before stopping is increasing fast. Trust in these systems is compounding. In fact, Anthropic’s Mythos Preview research makes this concrete. An autonomous AI agent was turned loose on a production memory-safe VMM. It identified a memory-corruption vulnerability that gave a malicious guest an out-of-bounds write to host process memory. But the agent was not able to produce a functional exploit — no code execution on the host, no full breakout. This is the point: the boundary class matters. In this case, the execution boundary is what prevented the discovered vulnerability from becoming a full breakout. As agents move into higher-stakes domains — where actions are harder to reverse and connected to real systems — the execution boundary becomes the constraint. Not the model’s capability. Agent security is not one bucket.

How We Think About It 🪐

At Buildfunctions, we approach this from a different starting point: Not all agent actions are equal. Different risk classes need different execution boundaries. Our architecture reflects that. We use a dual-plane model:
  • The Control Plane runs CPU and GPU Functions. These orchestrate trusted application and agent logic — the top-level coordination layer.
  • The Compute Plane runs nested, hardware-isolated CPU and GPU Sandboxes. These execute untrusted and dynamic agent actions inside their own VMs, each with its own kernel, using Cloud Hypervisor.
For GPU workloads, each sandbox receives its GPU through VFIO passthrough with IOMMU enforcement. Memory isolation is enforced at the hardware level. The guest operating system interacts with the GPU through native drivers, not a virtualized interface. Cross-tenant memory access is blocked by the IOMMU. This is not container isolation. It is a per-tenant VM boundary. On top of that isolation boundary, we built Runtime Controls — a guardrail layer around functions and tool calls during agent execution:
  • gate risky actions before execution
  • prevent loops and runaway tool usage
  • enforce tool-call budgets and policy gates
  • add circuit breakers, retries, and cancellation
  • provide runtime observability
Runtime Controls do not make an agent’s reasoning correct. They limit the damage a bad decision can cause, making long-running agent systems more viable in practice. Strong containment for the boundary. Runtime controls for the behavior. Both, not either-or.

The Bottom Line

“Secure sandbox” is not a sufficient description for agent infrastructure. If you are building agents that take actions against real systems, ask what the execution boundary actually is. Ask whether it is a shared kernel or a separate one. Ask whether controls are paired with containment or substituted for it. The execution boundary is not a detail. For agents, it is the foundation. Pair hardware-isolated CPU and GPU Sandboxes with Runtime Controls to start running agents with greater confidence.