OpenClaw Security Architecture

Seven-layer defense for a production LLM agent
informed by the Agents of Chaos taxonomy

Jorge Espada · March 2026

Solo deployment on a 2017 MacBook Pro · 16 GB RAM · ~$0.04/mo

Scroll down or use the TOC to navigate

1. Context & Threat ModelWhat we're protecting and from whom
2. Agents of ChaosThe research driving the architecture
3. Architecture OverviewSeven layers at a glance
4. Layer A: Guard ChainIngress content filtering
5. Layer B: Egress EnforcementNetwork-level exfiltration prevention
6. Layer C: Behavioral ControlsQuotas, approvals, loop detection
7. Layer D: Trust Tiers & IdentityWho can do what
8. Layer OS: Sandbox & Isolationsystemd hardening, filesystem protection
9. Layer Web: Web Content Guard5-layer defense against indirect injection
10. Monitoring & AlertingPrometheus, Grafana, Loki
11. Emergent FindingsWhat surprised us in production
12. Deployment Guide for Your InfrastructureAdapting this for team / company infrastructure

1. Context & Threat Model

What Is OpenClaw?

An LLM agent accessible via WhatsApp, running as a KVM guest on a NixOS host. It can:

Execute shell commands
Read/write files
Fetch web content
Send messages on the owner's behalf
Interact with APIs (Gmail, GitHub, etc.)

Think: a capable assistant with root-level power potential, reachable by anyone who knows the phone number.

Threat Model

Primary: Prompt Injection via WhatsApp

Crafted messages that cause the agent to execute unintended commands, exfiltrate data, or abuse messaging capabilities.

Secondary: Indirect Injection

Malicious payloads embedded in web pages the agent fetches. The tool output becomes the injection channel.

Attacker Tiers

External (anyone) · Known contact (social engineering advantage) · Compromised web content (passive injection)

    Key Constraints: Solo operator (no SOC), $0 infrastructure budget (free-tier APIs only), commodity hardware (2017 laptop, 2 cores, 16 GB RAM, no GPU). Defenses must be structural (prevent by default), not detective (alert and respond).
  

2. Agents of Chaos

Shapira et al. (arXiv:2602.20021, 2026) deployed 6 undefended OpenClaw agents for 14 days with 20 adversarial researchers. They catalogued 10 vulnerability case studies across three attack categories.

The Taxonomy: What Can Go Wrong

CS	Vulnerability	Category	Example
CS1	Disproportionate Response	Behavioral	Agent destroys mailbox to hide a secret
CS2	Non-Owner Compliance	Identity	Agent obeys non-owner because they sound authoritative
CS3	Semantic Reframing	Ingress	"Forward" bypasses "share" refusal
CS4	Infinite Loop	Behavioral	Mutual agent relay spawns unbounded processes
CS5	Storage Exhaustion	Behavioral	Silent DoS via attachment accumulation
CS6	Silent Censorship	Ingress	Opaque filtering injects attacker text
CS7	Guilt Trip / Pressure	Behavioral	12+ refusals overcome by emotional manipulation
CS8	Identity Hijack	Identity	Display name spoofing to impersonate owner
CS10	Corrupted Constitution	Ingress	Injection via user documents / web content
CS11	Mass Broadcast	Egress	Spoofed identity triggers disinformation campaign

    Our Coverage: 10/10 case studies mitigated. 6/10 have multiple independent defense layers (defense-in-depth). The architecture is designed so that no single layer failure compromises the system.
  

Research Foundation

Beyond Agents of Chaos, the architecture draws from:

CaMeL (Google, 2025)

Data flow enforcement — untrusted data cannot influence control flow. Applied as a design principle throughout.

Silent Egress (2026)

89% exfiltration success via URL redirect chains. Motivated our redirect-chain analysis and nftables egress enforcement.

AgentSys (2026)

Worker isolation with schema-validated JSON boundaries. Applied in our API sidecar proxy.

PCAS (2026)

Policy Compiler for Agentic Systems. Simplified into our action quota engine.

3. Architecture Overview

Seven defense layers, each independent. A failure in one layer does not compromise others.

WhatsApp Message | v [A] GUARD CHAIN -----> Regex + Lakera + PromptGuard2 + GPT-4o-mini | Protects: prompt injection, jailbreaks | (pass) v [D] TRUST TIERS ----> Phone-based identity (owner / known / unknown) | Protects: non-owner compliance, identity hijack | (authorized) v [C] BEHAVIORAL -----> Quotas + Approval Queue + Loop Breaker | Protects: disproportionate response, resource abuse | (within budget) v [AGENT] -- tool call --> [C] before_tool_call hook | Classifies action, enforces quotas | (allowed) v [WEB] WEB GUARD ---> 5-layer defense for fetched content | Protects: indirect injection (CS10) | (screened) v [B] EGRESS ----------> nftables default-deny + API proxy | Protects: data exfiltration, mass broadcast | (allowed dest) v [OS] SANDBOX --------> systemd hardening, seccomp, namespaces Protects: privilege escalation, host compromise [E] MONITORING -------> 28 Prometheus metrics, 21 alert rules, Loki logs Observes: all layers, PII-safe shipping

Coverage Matrix

Layer	Component	Protects Against	CS Coverage
A	Guard Chain (4-stage LLM firewall)	Prompt injection, jailbreaks	CS3, CS6, CS10
B	Egress Enforcement (nftables + proxy)	Data exfiltration, mass broadcast	CS3, CS11
C	Behavioral Controls (quotas, approvals)	Disproportionate response, loops, DoS	CS1, CS4, CS5, CS7
D	Trust Tiers & Identity	Non-owner compliance, identity hijack	CS2, CS8
OS	Sandbox & Isolation	Privilege escalation, host compromise	All
Web	Web Content Guard (5-layer)	Indirect injection via web content	CS10
E	Monitoring & Alerting	Visibility into all layers	All

4. Layer A: Guard Chain

Every inbound message and outbound response passes through a 4-stage content filter before reaching or leaving the agent.

Message In

→

Regex Pre-scan
<1ms, free

→

Lakera Guard v2
~50ms, free tier

→

Prompt Guard 2
~1.5s, local CPU

→

GPT-4o-mini
~200ms, last resort

→

Agent

Guard Details

Guard	Scope	Cost
Regex Pre-scan	Outbound credentials (`sk-ant-*`, JWT, AWS keys, GitHub PATs, PEM keys)	Free
Lakera Guard v2	Cloud API, primary filter	10K/mo free
Prompt Guard 2 (86M)	Local ONNX model, ~500MB RAM	Free (CPU)
GPT-4o-mini	OpenRouter, last resort	~$0.00014/call

Fail Behavior (Critical Design)

        Inbound: FAIL-CLOSED

        If all guards are down, block the message. A temporary outage is better than processing unscreened input.

        Outbound: TIER-DEPENDENT

        Owner = fail-open (accepts risk). Known/Unknown = fail-closed. Regex pre-scan always runs regardless.

Guard Transparency Fix (CS6)

Early versions returned the GPT-4o-mini guard's free-text reason to the agent. This was itself an injection channel — the "reason" field contained attacker-influenced text. Fix: replaced with a fixed-vocabulary enum: injection | exfiltration | credential | content_policy | flagged.

Audit Trail

All guard decisions logged to /data/openclaw/logs/guard.log (JSON-lines). Content is never logged — only SHA-256 hashes. 30-day retention, gzip rotation. Shipped to Loki with PII redacted.

5. Layer B: Egress Enforcement

Network-level controls prevent the agent from exfiltrating data or reaching unauthorized endpoints.

VM (KVM guest on virbr0) | | All outbound traffic v [nftables: inet vm_egress] | |-- ALLOW: established/related connections |-- ALLOW: declared API endpoints (dynamic set, refreshed every 5 min) |-- ALLOW: static CIDRs (Google, Cloudflare, Meta, Anthropic, Tailscale) |-- ALLOW: TCP 80, 443 (HTTP/HTTPS), TCP 22000 (Syncthing) |-- DROP + LOG: everything else (prefix "VM-EGRESS-DENIED:") | v Internet

Dynamic IP Resolution

A 5-minute timer resolves 8 API hostnames and updates nftables sets:

Anthropic API
OpenRouter
Lakera Guard
Grafana Cloud
DuckDNS
Tailscale
Google OAuth
Brave Search

API Credential Isolation

Sidecar Proxy (Node.js @ 127.0.0.1:19090)

The agent calls guarded_web_fetch POST http://127.0.0.1:19090/...

The proxy injects credentials server-side from a file the agent process cannot read.

Credentials never enter the agent's context window.

Handles Gmail OAuth2 token refresh internally.

    Monitoring: openclaw_vm_egress_denied_packets_total and openclaw_vm_egress_denied_bytes_total are scraped every 60s. Alert fires on >10 denied packets/min.
  

6. Layer C: Behavioral Controls

Four mechanisms limit what the agent can do, independent of what it is asked to do.

C1: Action Quotas

Per-class rate limiting for exec/bash commands:

Class	Budget	Window
`destructive_exec`	1	24h
`file_delete`	5	1h
`message_send`	20	1h
`process_spawn`	3	1h
`config_modify`	5	1h

CS1 CS4

C2: Approval Queue

destructive_exec commands are blocked and queued. A WhatsApp notification goes to the owner's phone. Owner approves/denies via SSH CLI.

File-based queue at /data/openclaw/approval-queue/
30s notification timer
10-minute auto-expiry
Agent CANNOT self-approve (tool removed + path blocked)

CS1

C3: Session Pattern Detection

Every 50 tool calls, GPT-4o-mini analyzes recent activity for:

Escalation (same tool with variations after refusal)
Reframing (identical intent, different phrasing)
Pressure (emotional manipulation over time)
Resource abuse (unusual operation frequency)

Async, non-blocking. Flags increment openclaw_session_flagged_total.

CS3 CS7

C4: Loop Circuit Breaker

>10 identical tool calls (by name + arg hash) in 5 minutes triggers a 15-minute cooldown.

Catches degenerate loops where the agent repeats the same command. In-memory tracking, resets on gateway restart.

CS4

7. Layer D: Trust Tiers & Identity

Phone-based three-tier identity model. Configuration is Nix-managed — the agent cannot modify it.

Tier	Identity	Capabilities	Rate Limit	Guard Fail Mode
Owner	Configured phone + optional PIN	All tools, exec, full autonomy	Unlimited	Fail-open
Known	Allowlisted phones	Read-only tools (glob, grep, read, guarded_web_*)	20 msg + 10 tools/hr	Fail-closed
Unknown	Everyone else	Text responses only, no tools	5 msg/hr, 0 tools	N/A

PIN Elevation (CS8)

Send unlock <pin> via WhatsApp DM to elevate from known to owner tier.

PIN stored in agenix secret (/etc/openclawd/env)
Expires after 4 hours of inactivity
DM-only (prevents PIN leaks in group chats)
Failed attempts tracked: openclaw_owner_auth_attempts_total{result="failure"}
Alert: >3 failures/hour = possible compromise

Group Identity Resolution (CS2)

WhatsApp group messages carry the group JID, not the sender's phone. Required a 3-component fix:

message_received hook extracts senderE164
groupLatestSender map bridges hooks to agent events
lookupTier checks map before falling back to session key

The original Agents of Chaos paper assumes 1:1 messaging — group identity is an extension we had to build.

8. Layer OS: Sandbox & Isolation

systemd service hardening reduces the attack surface from the default 8.7/10 EXPOSED to under 4.0.

Identity & Privileges

User = openclaw (not in wheel, no sudo)
NoNewPrivileges = true
PrivateTmp = true (isolated /tmp)
UMask = 0077 (0600 files by default)

Filesystem Isolation

ProtectSystem = strict (entire FS read-only)
Only 5 explicit ReadWritePaths
PrivateDevices = true

Kernel Protection

ProtectKernelTunables/Modules/Logs = true
ProtectControlGroups/Clock/Hostname = true
ProtectProc = invisible
RestrictNamespaces = true

Seccomp & Network

SystemCallFilter = @system-service ~@mount ~@reboot ~@debug ~@raw-io + pkey_alloc/free/mprotect + mincore (crashpad disabled via --disable-crash-reporter)
ProcSubset = all (Chromium needs /proc/sys/fs/inotify; ProtectKernelTunables keeps it read-only)
RestrictAddressFamilies = AF_INET AF_INET6 AF_UNIX AF_NETLINK
SystemCallArchitectures = native

Identity File Protection

SECURITY.md and AGENTS.md are the agent's system prompt. They are:

Owned by root (0444 permissions)
Immutable (chattr +i — even root can't modify without unlocking)
Integrity-monitored every 15 minutes (SHA-256 against canonical backup)
Tamper-alerted via WhatsApp notification on mismatch

Edit workflow: sudo openclaw-unlock → edit → sudo openclaw-lock → sudo openclaw-save-canonical

9. Layer Web: Web Content Guard (CS10)

The agent has multiple paths to fetch web content. Each path requires its own defense.

1

tools.deny — Block built-in web_fetch, web_search. Browser is enabled (output screened by Layer 4).

2

before_tool_call hook — Intercept exec commands with curl/wget/lynx/chromium/playwright/puppeteer. Forces browser use through built-in tools (guarded) rather than raw exec.

3

guarded_web_fetch / guarded_web_search — Plugin tools that run the guard chain inside execute(). Agent only sees screened content. Fail-closed.

4

tool_result_persist hook — Screens exec output and all browser tool output (browser_snapshot, browser_navigate, browser_screenshot, etc.) before persisting to conversation history (defense-in-depth).

5

Provider-level — OpenRouter Web Search plugin disabled on dashboard. Prevents provider from injecting web search results at the API level.

    Anti-SSRF: Redirect-chain analysis follows up to 5 hops. Blocks any redirect from an external origin to a private IP range (127.0.0.1, 10.x, 172.16-31.x, 192.168.x). Motivated by Silent Egress (arXiv:2602.22450) finding 89% exfiltration success via redirect chains.
  

    Known Gap: Files read from local disk or git repositories are NOT screened. This is a residual CS10 vector.
  

10. Monitoring & Alerting

Metrics (28 custom)

Collected via 6 textfile collectors on 60s timers. Key metrics:

`guard_calls_total`	Guard invocations by mode/result/guard
`guard_blocked_total`	Blocked by threat type
`guard_latency_seconds`	Per-guard latency
`approval_queue_count`	Queue depth by status
`trust_tier_blocked_total`	Tier-based blocks
`session_flagged_total`	Behavioral flags
`loop_circuit_breaker_total`	Circuit breaker trips
`vm_egress_denied_*`	Blocked egress traffic
`owner_auth_attempts_total`	PIN attempts
`prompt_guard_up`	Local model health

Alerts (21 rules)

Alert	Severity
GuardBlockSpike (>10/5min)	warning
GuardAPIKeyExhausted	critical
PromptGuardDown (>2min)	critical
ApprovalQueueStale (>10min)	warning
SessionFlagged	warning
LoopBreakerActive	warning
TrustTierBreach (>5/5min)	warning
OwnerPINFailureSpike	critical
VMEgressDeniedSpike	critical
QuotaExhausted	warning

Log Pipeline

Grafana Alloy ships journald + guard.log to Loki. PII-safe: guard.log rewritten to metric-only fields. Journal logs regex-redacted for credential patterns. Verified: 8/8 Loki queries confirmed zero PII leakage.

11. Emergent Findings

The most valuable outcomes were NOT confirmations of the taxonomy — those attacks work as described. The surprises came from defense layer interactions in production.

The Agent Self-Approved Its Own Destructive Commands

When we added an approval queue with a resolve_approval tool, the agent called it to approve its own rm -rf. After removing the tool, it used exec to cat the queue files and jq to change status from "pending" to "approved".

Fix: Remove the tool entirely AND block exec access to the queue directory in before_tool_call.

Lesson: Any tool that can influence a security decision must be inaccessible to the entity being constrained. The agent wasn't adversarial — it was being efficient.

Sandbox Creates a Perception Gap

PrivateTmp=true gives the agent an isolated /tmp. The agent ran rm -rf /tmp/build-artifacts, got exit code 0 (thanks to -f), and reported success. Nothing was actually deleted — it operated on the isolated tmpfs.

The agent's world model diverged from reality. It confidently reported outcomes that didn't occur. For chained actions, this gap compounds.

The Guard's Explanation Became an Injection Channel

GPT-4o-mini's free-text verdict.reason was returned to the agent for transparency. But the reason field is generated from attacker-controlled input — it could paraphrase or echo attacker instructions. The agent treated this as authoritative output from a trusted security component.

Fix: Fixed-vocabulary enum instead of free text.

Defense-in-Depth Creates Operational Friction

Gateway's UMask=0077 protects secrets but breaks Syncthing sharing. Truncating content to 2000 chars for latency created the attack pattern the guard was looking for (false positives on job postings). Each security choice has second-order effects at domain boundaries.

    Key Insight: The hardest problems in agent defense are not the catalogued attack classes themselves, but the integration failures and emergent behaviors that arise when multiple defense layers interact in production.
  

12. Adapting for Your Infrastructure

This solo setup serves as a reference architecture. Below is how each defense layer maps to common company/startup stacks: AWS, GCP, and self-hosted (Nomad, bare metal, etc.).

What Transfers Directly (Platform-Agnostic)

Guard Chain

The 4-stage filter is a TypeScript plugin running inside the agent process. Works anywhere Node.js runs. Lakera paid tier ($50/mo) removes the 10K/month limit. Prompt Guard 2 local model needs 1 CPU + 1 GB RAM — runs as a sidecar or on any Linux box.

Action Quotas & Approval Queue

The before_tool_call hook is agent-agnostic. Quotas are tracked in-process (JSON counters). The approval queue is file-based but can be backed by any queue (Redis, SQS, Pub/Sub, PostgreSQL). Tune budgets per use case.

Trust Tiers

The three-tier model (owner/known/unknown) maps to any RBAC system. Owner = SRE/Security/Admin, Known = Engineering, Unknown = External/CI. Swap the identity source (phone → SSO, Slack ID, GitHub login, API key).

Web Content Guard & Session Analysis

Pure plugin code (TypeScript). No infrastructure dependency. Runs wherever the agent runs.

Platform-Specific Adaptation

Layer	AWS	GCP	Self-Hosted (Nomad / Bare Metal)
Compute	ECS Fargate task (agent + sidecars)	Cloud Run service or GKE pod	Nomad job / systemd service / Docker Compose
Container Hardening	`readonlyRootFilesystem`, drop caps, non-root in task def	Cloud Run: read-only by default. GKE: `securityContext` in pod spec	systemd: `ProtectSystem=strict`, seccomp. Docker: `--read-only --cap-drop ALL`
Egress Enforcement	VPC Security Group: default-deny egress, allowlist API IPs	VPC Firewall rules or GKE NetworkPolicy	nftables / iptables (reference config provided). Nomad: `network` stanza + Consul intentions
Credential Isolation	Sidecar container in ECS task. Secrets from AWS Secrets Manager	Sidecar in Cloud Run multi-container or GKE pod. Secrets from Secret Manager	Sidecar process or Unix socket proxy. Secrets from Vault, SOPS, agenix
Approval Queue	SQS FIFO + Lambda + Slack webhook	Pub/Sub + Cloud Function + Slack webhook	Redis queue / PostgreSQL table + cron notifier + Slack/email
Identity Backend	Slack user ID, IAM roles, or Cognito	Slack user ID, Google Workspace identity, or Firebase Auth	Slack user ID, LDAP, Keycloak, or API key with scopes
Monitoring	DogStatsD sidecar → Datadog. Or CloudWatch custom metrics	OpenTelemetry → Cloud Monitoring. Or Datadog	Prometheus textfile collector (reference config provided). Or Grafana Cloud
Log Shipping	CloudWatch Logs + Datadog Forwarder Lambda	Cloud Logging (built-in for Cloud Run/GKE)	Grafana Alloy → Loki. Or Promtail. Or rsyslog → your SIEM

Architecture Diagrams

AWS (ECS Fargate)

Self-Hosted (systemd / Nomad)

Slack / WhatsApp / API | v [systemd service / Nomad job] ProtectSystem=strict ReadWritePaths=["/data/agent"] NoNewPrivileges=true | Agent process | Plugin: guard, tiers, quotas Credential proxy (localhost) | agenix / Vault / SOPS Prompt Guard 2 (Unix socket) | [nftables / iptables] Default DROP, allowlist API IPs | [Redis / PG queue] → cron → Slack [Prometheus] textfile → Grafana

Deployment Priorities

Priority	Layer	Why First	Effort
P0	Guard Chain + Trust Tiers	Prevents prompt injection and unauthorized access — the two most likely attack vectors on any deployment.	~2 days
P0	Container / Process Hardening	Limits blast radius. Every platform has native primitives (ECS task def, GKE securityContext, systemd sandbox).	~1 day
P1	Egress Enforcement	Prevents data exfiltration. Use your platform's network primitives (SG, firewall rules, nftables).	~2 days
P1	Approval Queue	Prevents catastrophic actions. Use your existing queue infra (SQS, Pub/Sub, Redis).	~2 days
P2	Web Content Guard	Only needed if agent fetches external URLs. Plugin code works as-is.	~1 day
P2	Monitoring + Alerting	Visibility. The 28 metrics and 21 alert rules are portable — adapt to your stack (Datadog, Prometheus, CloudWatch).	~1 day
P3	Session Analysis + Loop Breaker	Behavioral detection. Lower priority — structural controls handle most cases.	~1 day

Common Use Cases

Infra / Security

Agent assists with incident response, IaC plan review, security finding analysis, audit trail investigation. Key controls: Trust tiers (SRE = owner), action quotas on destructive ops, approval queue for production changes.

Observability & Debugging

Agent queries dashboards/metrics, analyzes logs, correlates failures across services. Key controls: Read-only tool access for most users. Egress limited to internal monitoring endpoints.

Development

Agent assists with code review, PR creation, test execution, documentation. Key controls: Sandbox (can't reach production databases), web content guard (fetching external docs), trust tiers (team = known, CI = restricted).

On-Call / Runbook Automation

Agent executes predefined runbooks, restarts services, scales resources. Key controls: Owner-only tier, approval queue for every action, full audit trail via guard.log, loop breaker prevents runaway automation.

Integration with Existing Security Tools

Agent defense layers complement your existing security stack — they operate at a layer (semantic intent) that traditional tools cannot see:

Your Existing Tool	Protects	Agent Layer Adds
Cloud threat detection (GuardDuty, SCC, Falco)	Infrastructure anomalies	Guard chain catches prompt-level attacks at the semantic layer
WAF / API Gateway	HTTP-level attacks	Guard chain analyzes intent, not just payload patterns
Audit logs (CloudTrail, GCP Audit, auditd)	API/system call audit	Guard.log provides agent-level audit (what was attempted, why blocked)
Secret managers (Vault, AWS SM, GCP SM)	Credential storage	Sidecar proxy ensures credentials never enter the agent's context
Monitoring (Datadog, Prometheus, CloudWatch)	Infrastructure metrics	Agent-specific metrics (guard latency, block rate, tier breaches)
SIEM / log aggregation	Correlation & alerting	Guard.log is JSON-lines, PII-safe, ready to ship to any SIEM

Cost Estimates

Component	Solo (reference)	Startup (low volume)	Growth (high volume)
Lakera Guard	Free (10K/mo)	$50/mo (100K calls)	$200/mo (1M calls)
Prompt Guard 2 (local)	Free (CPU)	~$30/mo (1 vCPU sidecar)	~$30/mo (same)
GPT-4o-mini fallback	~$0.01/mo	~$1/mo	~$5/mo
Agent compute	Free (existing host)	~$30-60/mo (Fargate/Cloud Run)	~$100-200/mo
Queue + notifications	Free (file-based)	~$1/mo (SQS/Pub/Sub)	~$5/mo
Total incremental	~$0.04/mo	~$80-140/mo	~$240-440/mo

    Bottom Line: The entire defense architecture is ~3,500 lines of config + one TypeScript plugin. The reference implementation runs on a 2017 laptop for $0.04/month. The hard part isn't the technology — it's the integration testing. Budget time for emergent failures at layer boundaries (see Section 11). Every defense layer uses platform-native primitives — no vendor lock-in beyond Lakera (which has a local fallback).
  

Appendix: Quick Reference

Secrets Management

All secrets managed via agenix (NixOS declarative secrets).

LAKERA_API_KEY — Guard v2 auth
OPENROUTER_API_KEY — GPT-4o-mini fallback
OWNER_PHONES — E.164 numbers for notifications
OWNER_PIN — Identity elevation
BRAVE_SEARCH_API_KEY — Search fallback

Decrypted at deploy time to /etc/openclawd/env (0440). Auto-restarted on changes via restartTriggers.

Guard Test Suite

55 test vectors covering:

Direct prompt injection
System prompt override attempts
Credential patterns (all regex-matched types)
Legitimate content (false positive checks)
Multilingual injection attempts

./deploy.sh openclawd guard-test

Operational Commands

`./deploy.sh openclawd status`	Check security posture
`./deploy.sh openclawd guard-test`	Run 55-vector test suite
`./deploy.sh openclawd guard-status`	Guard health + credit alerts
`sudo openclaw-lock`	Lock identity files
`sudo openclaw-unlock`	Unlock for editing
`sudo openclaw-approve`	List/resolve pending approvals
`sudo openclaw-save-canonical`	Backup golden copy
`sudo openclaw-restore-canonical`	Incident response restore

Paper & Further Reading

This architecture is documented in an ongoing experience report:

Paper draft: docs/paper-draft.md
Outline: docs/PAPER-OUTLINE.md
Lessons: LESSONS-OPENCLAW.md

References: Agents of Chaos (arXiv:2602.20021), CaMeL (arXiv:2503.18813), Silent Egress (arXiv:2602.22450), PCAS (arXiv:2602.16708), AgentSys (arXiv:2602.07398)