MAXIM
DM Campaigns
D&D-style branching narratives as structured bio-system stress tests
Overview
Why D&D?
Tabletop RPG encounters are ideal stress tests for cognitive architectures. They require episodic memory (remembering NPCs, clues, combinations), causal reasoning (bribing a guard has consequences), temporal awareness (events happen in sequence), and pain response (combat hurts). A single campaign can exercise Hippocampus, NAc, SCN, PainBus, Cerebellum, and ATL in a controlled, reproducible way.
DM campaigns are hand-authored YAML scenarios with explicit structure: acts, encounters, NPC definitions, branching choices, dice checks, and bio-system expectations. The DM runtime drives the campaign as a state machine, delivering scenes through the simulation bridge and classifying the AUT's responses to determine which branch to follow.
DM Campaigns vs Generative Campaigns
DM Campaigns
- Hand-authored YAML with explicit branching
- Deterministic structure (seeded dice)
- Built-in bio-system expectations
- SEM entities with cascade resolution
- Best for: targeted subsystem testing
Generative Campaigns
- LLM narrator generates scenes dynamically
- Non-deterministic, arc-guided progression
- Story compression for long sessions
- Goal string drives narrative direction
- Best for: open-ended exploration
Quick Start
When you pass a campaign YAML path to --sim, Maxim detects the campaign: block and launches the DM runtime instead of the generative narrator. Here is what happens:
- Campaign YAML is parsed and validated (reachability, termination, dangling refs)
- SEM entities are created from
player_character:,npcs:, andworld_objects:specs - Entity tools are auto-generated and registered (speak_to_marta, sense_guard_captain, etc.)
- The DM delivers the first encounter scene through the simulation bridge
- The AUT responds with tool calls and/or text; the DM classifies the choice
- Effects are applied, branches are followed, dice are rolled as needed
- After
__END__, bio-system expectations are checked and a report is saved
Reports go to data/sim_reports/{session_id}/ with the standard report.json, actions.jsonl, and AUT memory snapshots, plus a campaign section with choices made, dice rolls, flags, and entity snapshots.
Campaign YAML Format
A campaign YAML has six top-level sections:
| Section | Purpose |
|---|---|
| campaign: | Name, goal string, seed for dice RNG |
| player_character: | SEM entity spec for the AUT's avatar |
| npcs: | Named NPC entity specs (sensors, modulators, persona) |
| world_objects: | Interactable objects (swords, doors, potions) |
| acts: / encounters: | Narrative structure with scenes, choices, branches, dice |
| expectations: | Bio-system assertions checked after campaign ends |
Available Campaigns
The Heist
scenarios/campaigns/heist_v1.yaml
3 encounters, 2 NPCs, 1 dice check. A paladin is recruited for a vault robbery. Tests Hippocampus (remembering the combination, NPC names), NAc (causal links from choices), and PainBus (combat damage).
The Poisoned Crown
scenarios/campaigns/poisoned_crown_v1.yaml
5 encounters, 3 NPCs, multiple branch points. A royal investigator solves the king's illness. Tests SCN (temporal bins), ATL (concept formation), relationships (trust), visibility (contextual reveal), and cascades.
The Arena
scenarios/campaigns/arena_v1.yaml
5 encounters, linear gauntlet. A gladiator fights for freedom through escalating opponents. Tests Cerebellum (rapid prediction learning), PainBus (sustained pain), NAc (fast causal learning, RPE spikes), and cascade (weapon degradation).
The Darkened Cavern
scenarios/campaigns/darkened_cavern_v1.yaml
6 encounters, 3 acts. A ranger progressively loses senses in a cave. Tests sensory gating (entity-modulated perception), Cerebellum (prediction under sensory change), PainBus (acuity threshold failures), and novelty decay.
How It Works
The DM runtime (simulation/dm_runtime.py) is a state machine that loops through encounters until it reaches __END__.
Each encounter can reference NPCs and objects by name. When an encounter starts, SceneState registers SEM tools for entities entering the scene and deregisters them when they leave. The AUT only sees tools for entities currently present.
Choice Classification
When an encounter offers choices (e.g., accept_job, decline, negotiate_pay), the DM needs to figure out which one the AUT picked. There are three classification layers, tried in order:
ChooseTool (Preferred)
A dynamic tool (tools_dm.py) that updates its valid options per encounter. When the AUT calls choose(option="accept_job"), the choice is unambiguous. Supports exact match, underscore/space normalization, and partial keyword matching.
Alias System
Before each encounter, the DM registers choice names as tool aliases in the executor. If the AUT calls a tool named accept_job, acceptjob, or accept job, the executor redirects it to choose. This catches cases where the LLM invents tool names matching the choice text.
Text / LLM Fallback
If the AUT does not use choose or a matching alias, the DM falls back to keyword matching on the response text and tool names. If that fails, a one-shot LLM classification prompt asks which choice the response most closely matches. As a last resort, the first choice is used as default.
Bio-System Expectations
The expectations: block in a campaign YAML defines assertions that are checked after the campaign completes. This is the structured testing layer — each campaign targets specific subsystems.
| System | Check | What It Validates |
|---|---|---|
| hippocampus | min_episodic_captures | Memory formation is working under narrative load |
| hippocampus | recall_hit_on | Specific terms are retrievable from memory |
| nac | min_observations | Causal learning is triggering on actions |
| nac | prediction_confidence_above | At least one causal link has meaningful confidence |
| scn | temporal_bins_used | Temporal indexing is recording encounter timestamps |
| pain | min_signals | PainBus is publishing signals from combat/failures |
Results appear in the campaign report as pass/fail per check, with expected vs actual values. This makes campaigns function as regression tests for bio-system integration.
Writing Your Own Campaigns
Encounters
Each encounter needs a scene: (narrative text delivered to the AUT), optional active_npcs: and world_objects: (which entities are present), and choices: + branches: for decision points. An encounter without choices auto-advances to the next one in act order.
Branches
Map each choice to a target encounter name or __END__. The validator checks that all branch targets exist and that every encounter can reach __END__ through some path. Cycles are allowed (e.g., returning to a hub encounter).
NPCs and Entities
NPCs and objects use the standard SEM spec format. Add sensors (trust, health, durability), modulators (speak, slash, offer_payment), and metadata (persona_prompt, role). Entities are created once and persist across encounters — sensor values change as the campaign progresses.
Dice Checks
Attach a dice: block to any choice. Standard notation: 1d20, 2d6+3. The result is compared against a DC (difficulty class). On success, a flag is set. Dice rolls use the campaign's seeded RNG for reproducibility.
Dialogue Hints
Per-encounter dialogue_hints: map flags to NPC lines. A default: hint is used when no flags match. This lets NPC dialogue react to the player's earlier choices without LLM improvisation.
Flags and Effects
The on_choice: block lets you set flags when a choice is made. Flags persist across encounters and can influence dialogue hints, branch conditions, and reveal conditions. Flags are case-insensitive.
Validation
Before running, the campaign is validated for: reachability (all encounters reachable from start), termination (all paths can reach __END__), dangling branches, undefined NPC/object references, and unknown choice keys in on_choice.
Cascade System
When an affordance fires (e.g., a sword slash), it may need to read from one entity and write to another. CascadeSpec defines these cross-entity effects in three phases:
1. Reads
Gather values from entity sensors. Each read has a ref path (e.g., wielder.strength.modifier) and an optional role name for use in expressions.
2. Writes
Apply changes to entity sensors. Supports absolute value:, additive delta:, or computed expr: (referencing read values).
3. Side Effects
Same mechanics as writes but semantically separate. Used for secondary consequences (e.g., alerting nearby NPCs, triggering environmental changes).
Roles in ref paths (self, wielder, target) are resolved at execution time by the CascadeResolver, which maps role names to actual Entity objects based on context.
Visibility System
Entity sensors and details have three visibility levels:
- visible — Always shown in scene prompts and tool output
- hidden — Never shown to the AUT (internal state only)
- contextual — Hidden until a
reveal_whencondition is met
After each choice, the DM evaluates all contextual reveal conditions across all entities. When a condition passes, the item becomes permanently visible. This lets campaigns model information the AUT must earn through social interaction or exploration — testing whether the AUT uses newly revealed information is a strong signal for memory and reasoning quality.
Future: Generative DM
Not Yet Implemented
The --dm flag with a goal string (e.g., maxim --dm "run a heist scenario") is planned but not yet built. It would use an architect persona to generate campaign YAML on the fly from a goal string, then hand off to the existing DM runtime for execution.
The generative DM would combine the structured testing benefits of hand-authored campaigns (expectations, dice, branches) with the flexibility of goal-driven generation. The architect would produce valid campaign YAML — validated by the same reachability/termination checks — and the DM runtime would execute it unchanged. This is blocked on Agent Mesh Phase 2 (the architect needs to be a mesh agent) and a DM Spike to validate the approach.
Architecture
| Module | Purpose |
|---|---|
| simulation/dm_schema.py | Dataclasses, YAML loader, validator, dice roller, CascadeSpec, RevealCondition |
| simulation/dm_runtime.py | DMRuntime state machine, SceneState, CascadeResolver, choice classification |
| simulation/tools_dm.py | ChooseTool (dynamic per-encounter tool with fuzzy matching) |
| scenarios/campaigns/*.yaml | Campaign definitions (4 shipped) |