Skip to content

OpenSandbox Capabilities — What It Does and Why You Need It#

OpenSandbox is a secure execution runtime for AI agents. When your agent decides to "do something" in the real world — run code, open a browser, read/write files — OpenSandbox is the layer that makes that action happen safely inside an isolated container.

It is not an agent framework. It is not a model. It is the safe hands your agent uses to interact with the world.

The Core Idea — Think Docker for 100 Users on One Agent Server#

Imagine you have an AI agent application with 100 users.

Without OpenSandbox, every user's request — whether it's running a bash script, executing a Python file, or automating a browser — all runs directly inside your single application server process. One shared environment. One shared filesystem. One shared OS.

WITHOUT OpenSandbox:

  User 1  → runs bash script    ─┐
  User 2  → runs Python code    ─┤
  User 3  → runs bash script    ─┤──► Your App Server
  User 4  → opens a browser     ─┤         │
  ...                            ─┤    One process.
  User 100 → runs Python code   ─┘    One filesystem.
                                       One OS.

  Problems:
  ✗ User 2's infinite loop freezes everyone
  ✗ User 5's script can read User 1's files
  ✗ A bad LLM output can run `rm -rf` on your server
  ✗ One crash brings the whole app down

With OpenSandbox, every user gets their own isolated container — exactly like a separate Docker container on a Linux box. Your app server becomes the orchestrator, not the executor. It tells OpenSandbox "spin up a container, run this code, give me the result." The dangerous work never touches your application process.

WITH OpenSandbox:

  User 1  → runs bash script  ──► [ Container 1 ] isolated
  User 2  → runs Python code  ──► [ Container 2 ] isolated
  User 3  → runs bash script  ──► [ Container 3 ] isolated
  User 4  → opens a browser   ──► [ Container 4 ] isolated
  ...
  User 100 → runs Python code ──► [ Container 100 ] isolated
                                   Your App Server
                                   only orchestrates.
                                   Never runs the code itself.

  Benefits:
  ✓ User 2's infinite loop only crashes Container 2 — nobody else affected
  ✓ Each user's files are completely private
  ✓ A bad LLM output is trapped inside a container that gets destroyed
  ✓ Your app server stays healthy no matter what users run

Your app server becomes the brain, not the executor.

This is the entire point of OpenSandbox. It is the external isolation layer that sits between your agent application and the code that agents actually run. Just like Docker isolates processes on a Linux server, OpenSandbox isolates agent executions across all your users.


Core Capabilities#

1. Code Execution (Multi-Language)#

Run code generated by an AI agent in an isolated interpreter. Supports Python, JavaScript, Shell, and more. The result is returned to the agent so it can reason about the output.

sandbox.run_code("python", "import pandas as pd; print(pd.__version__)")
# Returns: { stdout: "2.2.1", stderr: "", exit_code: 0 }

What the agent can do with this:

  • Write a script, run it, check the output, fix bugs, rerun — all autonomously
  • Process uploaded files (CSV, JSON, PDF) without touching your server
  • Validate its own outputs before responding to the user

2. Shell Command Execution#

Execute arbitrary shell commands inside the sandbox. Install packages, move files, call CLIs — anything a terminal can do.

sandbox.run_command("pip install requests && python scraper.py")

What the agent can do with this:

  • Set up environments on the fly
  • Chain multi-step CLI workflows
  • Run build tools, test runners, linters

3. Filesystem Management#

Read, write, upload, and download files inside the sandbox. The agent has its own private filesystem that vanishes when the sandbox terminates.

sandbox.upload_file("data.csv", contents)
sandbox.run_code("python", "import pandas as pd; df = pd.read_csv('data.csv'); print(df.describe())")
result = sandbox.download_file("output.csv")

What the agent can do with this:

  • Accept user file uploads, process them, return results
  • Generate reports, charts, exports
  • Work with large datasets without storing them on your servers

4. Browser Automation (Chrome + Playwright)#

Spin up a full Chrome browser inside the sandbox. The agent can navigate pages, click buttons, fill forms, and extract data — without any risk to the host system.

sandbox.run_playwright("""
  page.goto("https://example.com")
  print(page.title())
""")

What the agent can do with this:

  • Scrape websites autonomously
  • Automate web-based workflows (form submissions, data extraction)
  • Test frontend applications
  • Complete tasks that require real browser interactions

5. Sandbox Lifecycle Management#

Create, monitor, pause, renew, and destroy sandboxes programmatically. Each sandbox is fully isolated — one per user, one per task, or one per agent.

sandbox = client.create(timeout=300)   # 5-minute sandbox
sandbox.renew(extra_seconds=120)       # extend if needed
sandbox.terminate()                    # clean teardown

What this enables:

  • Multi-tenant products where each user gets their own environment
  • Parallel execution — 10 agents running 10 sandboxes simultaneously
  • Automatic cleanup — no orphaned processes or leftover state

6. Port Exposure and Network Controls#

Expose a port from inside the sandbox (run a web server, a notebook, VS Code Web) and access it from outside. Simultaneously, lock down egress so the sandbox can't reach the internet unexpectedly.

sandbox.expose_port(8080)             # returns a public URL
sandbox.block_egress(["0.0.0.0/0"])  # prevent outbound calls

What this enables:

  • Run a Jupyter notebook or VS Code Web instance per user
  • Preview web apps built inside the sandbox
  • Compliance use cases where outbound data transfer must be blocked

7. Log and Metrics Streaming#

Stream real-time logs and resource metrics (CPU, memory) from inside the sandbox back to your application.

for line in sandbox.stream_logs():
    print(line)   # live output as the agent works

What this enables:

  • Show users a live progress feed ("Agent is installing dependencies...")
  • Detect runaway processes before they hit resource limits
  • Audit trails for compliance and debugging

8. Secure Container Runtimes#

OpenSandbox supports multiple isolation backends beyond standard Docker:

Runtime Isolation Level Use When
Docker Process-level Development, low-risk workloads
gVisor Kernel-level Untrusted user code
Kata Containers VM-level High-security, multi-tenant
Firecracker MicroVM Maximum isolation, fast boot

What Happens Without OpenSandbox#

Problem 1: Agent Runs Code on Your Server#

Without a sandbox, any code the AI generates runs directly on the host machine. One bad LLM output can:

  • Delete files (rm -rf)
  • Exfiltrate data (send your DB credentials to an external URL)
  • Install malware
  • Crash your server

With OpenSandbox: Code runs in a container. Your server never touches it.

Problem 2: One User Affects Another#

Without isolation, a Python script that eats 100% CPU for one user slows down every other user on the same server. A user who writes an infinite loop can bring your whole product down.

With OpenSandbox: Each user gets their own sandbox with resource limits. One bad script doesn't affect anyone else.

Problem 3: No State Cleanup Between Sessions#

Without a sandbox, leftover files, processes, and environment variables from one session bleed into the next. This causes:

  • Data leakage between users
  • Unpredictable agent behavior ("why is this variable already set?")
  • Disk filling up over time

With OpenSandbox: Each sandbox is ephemeral. Terminate it, and everything inside it disappears.

Problem 4: Agent Can't Verify Its Own Work#

Without code execution, your agent can only generate text — it cannot check whether the code it wrote actually works. You get plausible-looking output with no guarantee it runs.

With OpenSandbox: The agent writes code → runs it → sees the output → fixes bugs → confirms it works. A self-correcting loop.

Problem 5: Browser Automation Is Dangerous Without Isolation#

Running Playwright or Chrome on your server without isolation means:

  • Any website can attempt to exploit the browser
  • Cookies and sessions from one user's browsing can persist
  • A crashed browser process takes down your server process

With OpenSandbox: Chrome runs in a container. Crashes are contained. Sessions vanish on terminate.

Problem 6: No Audit Trail for Compliance#

In regulated industries (finance, insurance, healthcare), you need to prove what code ran, when, and what it returned. Without a sandbox layer, this is an afterthought bolted on later.

With OpenSandbox: Log streaming and execution records are built in from day one.

Problem 7: Scaling Is an Infrastructure Project#

Without a standard sandbox layer, scaling from 1 user to 10,000 means building your own orchestration — container provisioning, routing, teardown, resource limits. That's months of infrastructure work.

With OpenSandbox + Kubernetes: The runtime handles scaling. You add more nodes; the platform distributes sandboxes automatically.


Summary: The Without vs. With Comparison#

Your Agent Wants To... Without OpenSandbox With OpenSandbox
Run code the LLM just generated Runs on your live server — one bad output can delete files Isolated container — your server never touches it
Serve multiple users at once Shared environment — one user's script slows everyone One sandbox per user, resource limits enforced
Keep sessions clean between tasks Leftover files and state bleed into the next session Ephemeral — terminate the sandbox, everything vanishes
Let the agent verify its own work Not possible — agent can only guess if code works Agent runs it, reads the output, fixes and reruns itself
Automate a browser safely Chrome runs on your server — crashes and sessions persist Isolated Chrome — crashes are contained, sessions vanish
Prove what ran for compliance Custom logging bolted on later Log streaming and execution records built in from day one
Scale from 10 to 10,000 users Months of custom container orchestration work Kubernetes-native — add nodes, platform distributes sandboxes
Handle untrusted or user-submitted code Huge security risk — code runs with your server's permissions gVisor / Kata / Firecracker — kernel-level isolation

Bottom Line#

Every AI agent that can take real-world actions eventually needs a safe place to take them. OpenSandbox is that place.

Without it, you either limit your agent to text-only responses (safe but weak), or you let it act on your live infrastructure (powerful but dangerous). OpenSandbox is the path that lets you have both — a capable agent AND a secure system.


References: open-sandbox.ai | github.com/alibaba/OpenSandbox