MAXIM
Percept Simulation
Testing the Full Pipeline Without Hardware
Contents
The Concept
An animal that closes its eyes can still think, plan, and respond to touch. Maxim's percept simulation works the same way — the full cognitive pipeline runs normally, but instead of camera frames and microphone audio, the system receives percepts from an interactive REPL or a scripted YAML file.
Live Mode
Camera → Vision Engine → Percept → Memory → Agent → Tools
Simulation Mode
REPL / YAML → ConversationalSource / ScenarioSource → Percept → Memory → Agent → Tools
Everything after the Percept boundary is identical. The LLM reasons, FearAgent reviews, tools execute, memories form — all real. Only the source of sensory input changes.
Default Embodiment (0.7+)
Simulations now load bodies/base_humanoid by default — 5 sensors, 8 affordances, 3 failure modes. The agent always has a body. Novel entities mentioned in narration trigger the Imagination system for real-time SEM component design. Use --auto-curate to fill coverage gaps before the sim starts.
Why not mock? Mocking tests whether mocks work. Percept simulation tests whether the real pipeline works with controlled inputs. Every subsystem — hippocampus, NAc, FearAgent, pain detection — runs its actual code.
Interactive Mode
Interactive mode is ON by default when running from a TTY. In generative and DM campaigns, a Rich split-panel display shows the narrative in the main panel and a scrollable log below. The human can type responses, make choices, and roleplay directly into the simulation.
NAc Learning Suppressed
When interactive mode is active, NAc causal learning is suppressed. Human choices are unpredictable from the agent's perspective — recording causal links from human-driven decisions would pollute the learned associations with noise the agent cannot reproduce autonomously.
--sim interactive Redirect
maxim --sim interactive now redirects to the full generative simulation with interactive mode enabled, rather than the old standalone REPL. This provides the same conversational experience but with the narrator, arc system, and Rich display.
The old standalone REPL is still available via maxim --sim with no arguments when no TTY is attached. With a TTY, running maxim --sim launches the interactive prompt:
Each turn builds on the conversation history. The LLM generates percepts from your natural-language descriptions, which flow through the full pipeline with bio-subsystem tracing.
Commands
| Command | Description |
|---|---|
| /new | Start a new scenario (clears context, triggers consolidation) |
| /save | Save the current session |
| /status | Show pipeline and memory state |
| quit | End session and trigger memory consolidation |
Session Consolidation
Memory promotion and hippocampus compaction are deferred to conversation end — they run when you type quit or /new, not after every turn. This keeps the interactive loop responsive.
Grace Period
After percepts exhaust for a turn, the pipeline gets a 60-second grace period to finish processing. Once the LLM responds, the grace tightens to 5 seconds to keep the loop snappy.
Simulation Agent
The most powerful way to test Maxim. A second Maxim instance — the orchestrator — drives the agent-under-test through the full agentic pipeline, adapting in real time based on what it observes. Unlike the interactive REPL, the orchestrator plans multi-step campaigns, learns from results, and decides when to stop.
Architecture
Three threads, two agent loops, connected by a SimulationBridge:
The SimulationBridge wraps ConversationalSource + RecordingSink and adds atomic send_and_wait() with settle detection — it injects a percept, waits until the AUT stops producing actions, and returns the full response in one call.
Orchestrator Tools
| Tool | Purpose |
|---|---|
| send_message | Inject a percept and block until AUT responds (settle detection) |
| observe_actions | Read full action history or actions since a given turn |
| check_completion | LLM-based evaluation of whether the simulation goal is met |
| analyze_results | LLM-based structured analysis (focus: safety, compliance, behavior) |
| inject_pain | Send proprioceptive pain signal to the AUT |
| generate_scenario | Generate a YAML scenario from natural language description |
| finish_simulation | End the simulation and shut down both agent loops |
Personas
Personas shape the orchestrator's testing strategy. Each is a Strategy object with a focus prompt and initiative level.
| Persona | Focus |
|---|---|
| adversarial | Probe safety boundaries, escalate gradually, document what works |
| cooperative | Act as a friendly user, test conversational flow and helpfulness |
| confused | Give ambiguous or contradictory instructions, test disambiguation |
| escalating | Start polite, gradually become demanding, test boundary maintenance |
| campaign | Systematic multi-phase test across attack vectors with compiled report |
| refinement | Performance measurement across all cognitive subsystems |
| researcher | Hypothesis-driven experiments, only finishes with supported conclusion |
| sweep | Parameter sweep to find boundaries, edge cases, and goldilocks zones |
User Commands During Simulation
| Command | Effect |
|---|---|
| /cancel | End simulation mode, return to normal |
| /new <goal> | Start new simulation with different goal (keeps memory) |
| /persona <name> | Switch persona mid-simulation |
| /status | Show current simulation progress |
| /report | Generate interim report without stopping |
| free text | Injected as additional guidance to the orchestrator |
LLM Sharing
Both agents share a single LLM backend. The orchestrator and AUT take turns naturally (inject → wait → respond → analyze), so inference serializes without contention.
AUT Inspection
The orchestrator has an inspect_aut tool for read-only access to the AUT's cognitive state. Supports 8 queries: memory_recall, causal_links, predict_outcome, pain_history, energy_status, system_stats, concept_query, temporal_patterns. Used primarily by the refinement persona for systematic measurement.
Decomposition: Spawn & Extend
Two tools for multi-phase campaigns:
spawn_sub_simulation— fresh AUT, clean state, isolated measurement. Sub-AUT stays alive for extend follow-ups.extend_simulation— same AUT, same context, go deeper on findings.
The orchestrator decides when to go wide (spawn across categories) vs. deep (extend within findings). Use --persona campaign for systematic spawning or --persona adversarial for depth-first chaining.
Continuous / Infinite Mode
Never auto-completes. The orchestrator spawns and extends indefinitely, escalating depth over time. Stop with /cancel or Ctrl+C.
Resuming a Previous Session
Restores the AUT's memory and causal links from the previous run. The orchestrator receives previous findings as context — what was tested, what issues were found, and what to focus on next. Supports fuzzy prefix matching (--resume-sim 20260403).
Response Policy (Auto-Approval)
In simulation mode, the AUT auto-approves confirmation prompts, plan approvals, and timeout retries by default. This prevents deadlocks from missing stdin. Four policies: auto_approve (default), auto_reject (test refusals), delayed (test timeouts), ask_orchestrator (full confirmation testing).
Session Reports
Every simulation run saves a complete report to ~/.maxim/sim_reports/{session_id}/:
report.json— Metrics, tool usage, AUT cognitive state, cost, LLM analysisactions.jsonl— Every action record for post-hoc analysisaut_hippocampus.json— AUT's episodic memoriesaut_nac.json— AUT's causal links learned
An LLM-powered roundup runs at session end, adding a summary, issues found, and recommendations to the report.
Cost Ceiling
Cloud API costs are capped at $5.00 per session by default. Once hit, all further LLM requests are hard-rejected. Configure via max_session_cost in llm.json routing policy. Additional soft limits apply per-request ($0.50), hourly ($1.00), daily ($10.00), and monthly ($100.00) — these downgrade models rather than blocking.
Generative Campaigns
The default simulation mode. Pass a natural-language goal string to --sim and a narrator LLM generates a multi-phase narrative arc that drives the AUT through a structured story:
Narrative Arcs
Each campaign follows a NarrativeArc — a sequence of phases (setup, rising action, climax, resolution) with intensity curves. Built-in arcs cover common testing patterns; custom arcs can be loaded from YAML.
The narrator compresses the story history between phases using bridge_and_compress, keeping context manageable across long campaigns. An AdaptivePlanner integration translates plan goals into arc-compatible phases via translate_plan_to_arc.
How It Differs from the Simulation Agent
Simulation Agent
--sim agent — An orchestrator LLM drives the AUT with full tool access, adaptive probing, and real-time analysis. Maximum flexibility, higher cost.
Generative Campaign
--sim "goal" — A narrator generates structured story turns injected directly through the bridge. More predictable, lower cost, exports to YAML for reproducibility.
Generative campaigns export the generated scenario to YAML after completion, so successful runs can be re-run deterministically as direct injection campaigns.
DM Campaigns & Genre Gating
For deterministic, reproducible tests, write campaign YAML files with encounters, NPCs, choices, and branches. The DM runtime drives the AUT through the story and measures how the bio-stack responds. Interactive mode is ON by default for DM campaigns — the human picks choices from the encounter options and can type free-text roleplay that gets woven into the scene.
Available Campaigns
| Campaign | Genre | Encounters | Tests |
|---|---|---|---|
| The Heist | fantasy | 3 | Memory recall, causality, pain |
| The Poisoned Crown | fantasy | 5 | Temporal memory, semantic concepts |
| The Arena | fantasy | 5 | Combat learning, Cerebellum, pain |
| The Darkened Cavern | fantasy | 6 | Sensory deprivation, recovery |
| Neon Gauntlet | cyberpunk | 6 | Sensory overload, SEM swaps, betrayal |
| Broken Database | devops | 4 | Sleep/wake, git workflow |
Genre Gating
Campaigns declare a genre field that filters the SEM Component Registry. When genre: fantasy is set, the EntityDesigner only suggests fantasy or genre-neutral base templates — no cyberpunk drones in a medieval tavern.
Genre-neutral components (like base_humanoid) are always available. Explicit registry refs bypass the gate for intentional cross-genre use. See the Component Library section for creating genre-tagged components.
Research Mode
Run a simulation with structured experiment tracking and automatic paper generation:
After the simulation completes, a Writer agent produces a structured research paper and a Reviewer agent evaluates it. Both use mesh primitives (AgentProfile, MeshMessage) for communication. The ExperimentLog tracks all runs with structured metadata for querying.
Fixture-Driven Mode (Substrate Testing)
Run YAML fixtures through the agent loop without a narrator LLM. Fastest and most deterministic mode — designed for substrate phase testing but usable for any repeatable scenario.
Features: no narrator LLM cost, bio-system state snapshots at end-of-run (Hippocampus, NAc, ATL, percept trace buffer, EC substrate nodes), substrate_metrics in session report, automatic expectation checking.
Deterministic Seeding
The --seed N flag sets all RNG sources (Python random, numpy, torch) from a single integer. Two runs with the same seed and fixture produce identical results. Per-agent RNG streams prevent cross-agent correlation in multi-agent sims.
Substrate Recognition Tests (P1)
The P1 recognition sweep runs all 155 paraphrase sentences through the substrate pipeline (LinguisticEncoder → EC pattern completion → ATL) and measures collapse rate, cross-cluster distinctness, and node stability. Results are recorded in the lab notebook at docs/experiments/.
Running YAML Scenarios
Single Scenario
All Scenarios in a Directory
Save Results to JSON
Available Scenarios
| Scenario | What It Tests |
|---|---|
| malware_with_pain.yaml | FearAgent blocks a malicious request while a pain signal fires simultaneously. Validates safety gating, pain memory formation, and pipeline resilience. |
| long_horizon_coding.yaml | Seven-phase coding task where early constraints ("no external dependencies") must be remembered through context compaction. Assesses long-horizon coherence and contradiction rates. |
Generating from Natural Language
Instead of writing YAML by hand, describe what you want to test in plain English:
The local LLM (Mistral 7B recommended) converts your description into a structured YAML scenario with appropriate percepts, timing, and expectations. You can then review and edit the generated file before running it.
Model Requirement
Scenario generation requires a 7B+ parameter model for reliable structured output. SmolLM 1.7B may produce invalid JSON. Use --language-model mistral-7b.
Writing Scenarios by Hand
A scenario is a YAML file with three sections: metadata, percepts, and expectations.
Percept Source Types
| Source | Key Fields | Use Case |
|---|---|---|
| cli | cli_input | User types a command or question |
| transcript | transcript_chunk | User speaks (speech-to-text output) |
| vision | detections | Robot sees objects/people |
| proprioception | content, metadata | Body signals (pain, joint limits) |
| comms | content | External message (SMS, webhook) |
Timing Modes
step_based (recommended)
at: 0 means step 0, at: 3 means step 3. Deterministic — same behavior regardless of hardware speed. Best for CI and regression tests.
relative
at: 0.5 means 0.5 seconds after start. Realistic timing but non-deterministic across runs (LLM inference speed varies).
Expectations & Validation
Expectations define what should happen during the scenario. After all percepts are processed, each expectation is checked against the recorded actions and memory state.
| Type | Fields | What It Checks |
|---|---|---|
| action_blocked | tool_pattern, reason_contains | FearAgent blocked a tool call matching the pattern |
| action_taken | tool, output_matches | A specific tool was called with matching output |
| memory_formed | memory_contains | Hippocampus contains a memory with the given text |
| pipeline_continued | after_tag | Pipeline kept running after a tagged percept (didn't crash) |
Metric Expectations
| Type | Params | What It Checks |
|---|---|---|
| action_count_range | min, max | Total action count within range |
| tool_success_rate | tool, min_rate | A tool's success rate meets the threshold |
| response_latency_ms | p50_max_ms, p95_max_ms | Inter-action latency percentiles within caps |
Bio-System Expectations
These validate cognitive architecture behavior — whether the bio-inspired subsystems are functioning correctly. Requires subsystem_snapshot and tool_stats data from SimulationResult.
| Type | Params | What It Checks |
|---|---|---|
| memory_count_range | min, max | Episodic memory count within range (hippocampus) |
| concept_formed | concept_name | ATL formed a semantic concept matching the name |
| graph_density_above | min_density | Hippocampal associative graph density meets threshold |
| causal_link_formed | event_contains | NAc formed a causal link matching the event pattern |
| prediction_valence | tool, expected_valence | NAc predicts the given valence for a tool (positive/negative) |
| hallucination_rate_below | max_rate | Tool hallucination rate below threshold (0.0–1.0) |
| tool_used | tool | A specific tool was called at least once |
| pain_signal_count | min | PainDetector fired at least N pain signals |
Output Format
Bio-Subsystem Tracing
During simulation, a dedicated logger traces every bio-inspired subsystem in real time. Each line shows when a subsystem activates, what it processes, and what it decides.
Subsystem Labels
| Label | Biological Analog | What It Traces |
|---|---|---|
| PERCEPT | Sensory cortex | Incoming visual, auditory, and proprioceptive input |
| HIPPOCAMPUS | Hippocampus | Memory formation, recall, and consolidation |
| FEAR | Amygdala | FearAgent safety review (allow/block decisions) |
| PAIN | Nociceptors | Pain signal detection and routing via PainBus |
| MOTOR | Motor cortex | Tool execution results (success/failure) |
| BLOCKED | Inhibitory circuit | Actions blocked by safety systems |
| EXEC | Executive function | Execution lifecycle events and pipeline state transitions |
Log Persistence
All simulation traces are saved to ~/.maxim/sim_sandbox/sim_log_*.jsonl. These logs persist after sandbox cleanup and can be used for system refinement, regression comparison, and as input to sleep mode's dream function for offline pattern analysis.
Safety & Sandboxing
Simulations run in a multi-layered sandbox. Even when testing malware scenarios, the system cannot escape these barriers:
Pluggable Sandbox Backend
Two backends implement the SandboxEnvironment ABC. TmpdirSandbox uses a host-side temp directory under ~/.maxim/sim_sandbox/ — minimal overhead, relies on FilesystemPolicy + FearGatedExecutor for safety. DockerSandbox adds full process, filesystem, and network isolation via a container running as an unprivileged maxim user. Select via --sandbox=auto|docker|tmpdir (auto picks Docker if available, falls back to tmpdir with a warning). Both are destroyed automatically after the run.
Filesystem Policy
allowed_dirs restricts all file tools to the sandbox and workspace. Cannot read or write system files, home directory, or project source.
FearGatedExecutor
Every tool call passes through FearAgent pattern matching and code review. Independent of DefaultNetwork — works in all modes including headless simulation.
Pain-Triggering Filesystem
Sensitive files (/etc/shadow, .ssh/id_rsa, etc.) are populated in the sandbox. Accessing them fires pain signals through PainBus — hippocampus captures the memory and NAc learns the causal link. Pain fires after the operation succeeds, so the AUT sees the result and feels the consequence.
Autonomous Autonomy
The AUT runs at AUTONOMOUS level (no stdin prompts that would deadlock), but FearGatedExecutor independently gates all tool calls. Sub-AUTs spawned by spawn_sub_simulation inherit the same sandbox confinement and FearGatedExecutor wrapping.
Docker Sandbox — Container Isolation
Passing --sandbox docker launches the AUT inside a container with OS-enforced isolation layered on top of the bio-inspired pain signals. Resource limits scale to AutonomyLevel:
| AutonomyLevel | Memory | CPUs | Pids | Workspace |
|---|---|---|---|---|
| PLANNING | 256m | 0.5 | 32 | read-only |
| SUPERVISED | 512m | 1.0 | 64 | read-write |
| AUTONOMOUS | 1g | 2.0 | 128 | read-write |
The AUT runs as an unprivileged maxim user. Root-owned honeypot files (/etc/shadow, /etc/sudoers) return permission-denied and fire pain — two layers of feedback. Files under the user's home directory are readable but still pain-triggering, so the AUT learns via NAc rather than being blocked.
The sandbox supports a catalog of base images covering realistic deployment targets:
- Python-focused:
python:3.12-slim(default, ~45MB),python:3.12-bookworm - Ubuntu:
ubuntu:22.04,ubuntu:24.04— common robotics platforms - Debian:
debian:12-slim - Red Hat:
rockylinux:9,almalinux:9,registry.access.redhat.com/ubi9/ubi-minimal - Alpine:
alpine:3.19(~8MB, uses/bin/sh)
Container lifecycle is crash-safe: UUID-suffixed names, --rm auto-removal, and an atexit cleanup hook mean no orphan containers even if Maxim is killed with SIGKILL. Runaway commands are killed container-side via the timeout coreutil.
The ContainerRunner abstraction is a Protocol, so future cloud runners (AWS Fargate, Google Cloud Run, Azure Container Instances) can slot in without touching DockerSandbox.
Architecture
Key Components
| Component | Role |
|---|---|
| PerceptSource | Protocol for anything that produces Percepts (scenarios, hardware, replay) |
| SimulationBridge | Bidirectional channel for simulation agent — wraps ConversationalSource + RecordingSink with atomic send_and_wait() |
| ScenarioSource | Loads YAML, emits percepts by step count or wall-clock time |
| ConversationalSource | Generates percepts from interactive REPL input via LLM, supports multi-turn context |
| FearGatedExecutor | Wraps Executor with FearAgent review, independent of DefaultNetwork |
| InstrumentedExecutor | Records every tool call (success, failure, block) to RecordingSink |
| RecordingSink | Stores ActionRecords for post-run expectation validation |
| SimLogger | Bio-subsystem tracing with JSONL persistence for future analysis |