ADR-0052 · Tool Sandboxing and Safety Classes

Status: Accepted · Date: 2026-05-17

Context

Every meaningful agent failure mode in recent years traces to a tool call: tool poisoning attacks embed instructions in metadata; prompt injection from a fetched page trips a shell tool; a credential leaks via an over-eager logger; a filesystem tool reads /etc/passwd. The MCP threat model and the OWASP LLM Top 10 enumerate the same primitives — tool poisoning, direct and indirect prompt injection, command injection, confused deputy, credential theft, directory traversal, schema spoofing.

Frameworks today usually offer a function-calling API with no input validation, an eval-shaped code interpreter, or a "do-anything" shell tool with cwd = process cwd. Not acceptable for any agent that touches money, production, or untrusted data.

Decision

In the Agent Fabric, a tool is a typed, classified, sandboxed contract — not a function.

1. Typed contract

Every tool is declared via tool({ name, description, input: ZodSchema, output: ZodSchema, safetyClass, idempotency, execute }). Schemas are required, not optional. Inputs are validated before dispatch; outputs are validated after.

2. Safety classes

Every tool declares one of: read | write | network | financial | privileged. The policy engine, approval routes, retry policy, and audit emitter all key off the class. Defaults: financial → human approval; privileged → dual approval; network → allowlist; read → freely callable.

3. Sandboxed execution

Tools run inside a SandboxRuntime. Built-ins:

DenyAllSandbox — refuses all side-effects (the default).
LocalProcessSandbox — child process with chroot-like path jail, env filter, CPU/memory/time limits.
Pluggable adapters for remote isolation (E2B / Modal / Firecracker class).

Filesystem tools accept paths relative to a jail root; absolute paths, .. traversal, and symlink escape are rejected at the sandbox boundary. Network tools require an explicit networkAllowlist; unlisted hosts are blocked at the egress layer, not at the tool's discretion.

4. Schema and metadata integrity

Tool definitions are content-hashed at registration. Descriptions (the field a TPA targets) are truncated to a hard maximum, scanned for known injection signatures (e.g., ignore previous instructions, hidden Unicode tag characters, role-impersonation tokens), and displayed in audit traces alongside their hash so a description swap is detectable. Tools imported from MCP servers are sanitised on ingress (ADR-0057). Homoglyph collisions on tool names raise a registry error.

5. Idempotency

Consequential tools declare idempotency: 'required' | 'optional'. The runtime threads an idempotency key through retries and replays; an IdempotencyStore (@veridex/agents-treasury) ensures double-spend is impossible across crashes.

6. Input / output sanitisation

Output sanitisation strips terminal escape sequences, hidden Unicode (BiDi, tag chars), embedded markdown that could re-prompt the model, and credential-shaped patterns. The model sees a cleaned string; the audit log retains both raw and cleaned.

Consequences

Positive. Tool failures become visible, attributable, and bounded. The same primitives power production policy, audit, and replay.

Negative. Tool authors must declare schemas and classes; "just give me an exec" is deliberately hard. The ergonomics are designed around tool({...}) so the discipline reads naturally.

Source

Internal ADR: docs/architecture/decisions/0052-tool-sandboxing-and-safety-classes.md

0051 · Tiered Memory Lifecycle 0053 · Policy Engine