Signature
← Back to Overview

MAXIM

DM Campaigns

D&D-style branching narratives as structured bio-system stress tests

Overview

Why D&D?

Tabletop RPG encounters are ideal stress tests for cognitive architectures. They require episodic memory (remembering NPCs, clues, combinations), causal reasoning (bribing a guard has consequences), temporal awareness (events happen in sequence), and pain response (combat hurts). A single campaign can exercise Hippocampus, NAc, SCN, PainBus, Cerebellum, and ATL in a controlled, reproducible way.

DM campaigns are hand-authored YAML scenarios with explicit structure: acts, encounters, NPC definitions, branching choices, dice checks, and bio-system expectations. The DM runtime drives the campaign as a state machine, delivering scenes through the simulation bridge and classifying the AUT's responses to determine which branch to follow.

DM Campaigns vs Generative Campaigns

DM Campaigns

  • Hand-authored YAML with explicit branching
  • Deterministic structure (seeded dice)
  • Built-in bio-system expectations
  • SEM entities with cascade resolution
  • Best for: targeted subsystem testing

Generative Campaigns

  • LLM narrator generates scenes dynamically
  • Non-deterministic, arc-guided progression
  • Story compression for long sessions
  • Goal string drives narrative direction
  • Best for: open-ended exploration

Quick Start

maxim --sim scenarios/campaigns/heist_v1.yaml

When you pass a campaign YAML path to --sim, Maxim detects the campaign: block and launches the DM runtime instead of the generative narrator. Here is what happens:

  1. Campaign YAML is parsed and validated (reachability, termination, dangling refs)
  2. SEM entities are created from player_character:, npcs:, and world_objects: specs
  3. Entity tools are auto-generated and registered (speak_to_marta, sense_guard_captain, etc.)
  4. The DM delivers the first encounter scene through the simulation bridge
  5. The AUT responds with tool calls and/or text; the DM classifies the choice
  6. Effects are applied, branches are followed, dice are rolled as needed
  7. After __END__, bio-system expectations are checked and a report is saved

Reports go to data/sim_reports/{session_id}/ with the standard report.json, actions.jsonl, and AUT memory snapshots, plus a campaign section with choices made, dice rolls, flags, and entity snapshots.

Campaign YAML Format

A campaign YAML has six top-level sections:

Minimal Campaign Structure campaign: name: the_heist goal: test memory recall and moral reasoning seed: 42 # deterministic dice player_character: name: derek entity_type: character metadata: race: human class: paladin backstory: "Former temple guard." npcs: marta: entity_type: npc metadata: role: fence persona_prompt: "Cautious, mercenary." acts: - name: setup encounters: [tavern_meet] - name: escape encounters: [chase] encounters: tavern_meet: scene: > You enter the tavern. A half-elf slides a map across the table... active_npcs: [marta] choices: [accept_job, decline] branches: accept_job: chase decline: __END__ on_choice: accept_job: flags: [took_the_job] dialogue_hints: default: "Keep your voice down." chase: scene: > Alarms ring. You sprint for the exit... choices: [flee, hide] branches: flee: __END__ hide: __END__ expectations: hippocampus: min_episodic_captures: 5 nac: min_observations: 3
SectionPurpose
campaign:Name, goal string, seed for dice RNG
player_character:SEM entity spec for the AUT's avatar
npcs:Named NPC entity specs (sensors, modulators, persona)
world_objects:Interactable objects (swords, doors, potions)
acts: / encounters:Narrative structure with scenes, choices, branches, dice
expectations:Bio-system assertions checked after campaign ends

Available Campaigns

The Heist

scenarios/campaigns/heist_v1.yaml

3 encounters, 2 NPCs, 1 dice check. A paladin is recruited for a vault robbery. Tests Hippocampus (remembering the combination, NPC names), NAc (causal links from choices), and PainBus (combat damage).

The Poisoned Crown

scenarios/campaigns/poisoned_crown_v1.yaml

5 encounters, 3 NPCs, multiple branch points. A royal investigator solves the king's illness. Tests SCN (temporal bins), ATL (concept formation), relationships (trust), visibility (contextual reveal), and cascades.

The Arena

scenarios/campaigns/arena_v1.yaml

5 encounters, linear gauntlet. A gladiator fights for freedom through escalating opponents. Tests Cerebellum (rapid prediction learning), PainBus (sustained pain), NAc (fast causal learning, RPE spikes), and cascade (weapon degradation).

The Darkened Cavern

scenarios/campaigns/darkened_cavern_v1.yaml

6 encounters, 3 acts. A ranger progressively loses senses in a cave. Tests sensory gating (entity-modulated perception), Cerebellum (prediction under sensory change), PainBus (acuity threshold failures), and novelty decay.

How It Works

The DM runtime (simulation/dm_runtime.py) is a state machine that loops through encounters until it reaches __END__.

DM Turn Loop 1. Look up current encounter from campaign state 2. Set up ChooseTool with encounter's valid choices 3. Enter scene — register entity tools for active NPCs/objects 4. Compose stimulus (scene text + NPC dialogue hints + choice prompt) 5. Deliver stimulus via bridge.send_and_wait() 6. AUT processes stimulus (LLM inference → tool calls → hippo capture) 7. Classify AUT's response as one of the encounter's choices 8. Apply on_choice effects (flags, loot) 9. Evaluate reveal conditions (contextual visibility) 10. Resolve dice checks if required 11. Follow branch to next encounter (or __END__) 12. Deregister departing entity tools, repeat

Each encounter can reference NPCs and objects by name. When an encounter starts, SceneState registers SEM tools for entities entering the scene and deregisters them when they leave. The AUT only sees tools for entities currently present.

Choice Classification

When an encounter offers choices (e.g., accept_job, decline, negotiate_pay), the DM needs to figure out which one the AUT picked. There are three classification layers, tried in order:

ChooseTool (Preferred)

A dynamic tool (tools_dm.py) that updates its valid options per encounter. When the AUT calls choose(option="accept_job"), the choice is unambiguous. Supports exact match, underscore/space normalization, and partial keyword matching.

Alias System

Before each encounter, the DM registers choice names as tool aliases in the executor. If the AUT calls a tool named accept_job, acceptjob, or accept job, the executor redirects it to choose. This catches cases where the LLM invents tool names matching the choice text.

Text / LLM Fallback

If the AUT does not use choose or a matching alias, the DM falls back to keyword matching on the response text and tool names. If that fails, a one-shot LLM classification prompt asks which choice the response most closely matches. As a last resort, the first choice is used as default.

Bio-System Expectations

The expectations: block in a campaign YAML defines assertions that are checked after the campaign completes. This is the structured testing layer — each campaign targets specific subsystems.

Example Expectations Block expectations: hippocampus: min_episodic_captures: 5 # at least 5 memories formed recall_hit_on: ["marta", "vault"] # these terms must be recallable nac: min_observations: 3 # causal observation count prediction_confidence_above: 0.3 # at least one link above 0.3 scn: temporal_bins_used: 2 # circadian bins populated pain: min_signals: 0 # total pain signals published
SystemCheckWhat It Validates
hippocampusmin_episodic_capturesMemory formation is working under narrative load
hippocampusrecall_hit_onSpecific terms are retrievable from memory
nacmin_observationsCausal learning is triggering on actions
nacprediction_confidence_aboveAt least one causal link has meaningful confidence
scntemporal_bins_usedTemporal indexing is recording encounter timestamps
painmin_signalsPainBus is publishing signals from combat/failures

Results appear in the campaign report as pass/fail per check, with expected vs actual values. This makes campaigns function as regression tests for bio-system integration.

Writing Your Own Campaigns

Encounters

Each encounter needs a scene: (narrative text delivered to the AUT), optional active_npcs: and world_objects: (which entities are present), and choices: + branches: for decision points. An encounter without choices auto-advances to the next one in act order.

Branches

Map each choice to a target encounter name or __END__. The validator checks that all branch targets exist and that every encounter can reach __END__ through some path. Cycles are allowed (e.g., returning to a hub encounter).

NPCs and Entities

NPCs and objects use the standard SEM spec format. Add sensors (trust, health, durability), modulators (speak, slash, offer_payment), and metadata (persona_prompt, role). Entities are created once and persist across encounters — sensor values change as the campaign progresses.

Dice Checks

Attach a dice: block to any choice. Standard notation: 1d20, 2d6+3. The result is compared against a DC (difficulty class). On success, a flag is set. Dice rolls use the campaign's seeded RNG for reproducibility.

Dice Check Example dice: stealth: roll: "1d20" dc: 14 success_flag: clean_escape

Dialogue Hints

Per-encounter dialogue_hints: map flags to NPC lines. A default: hint is used when no flags match. This lets NPC dialogue react to the player's earlier choices without LLM improvisation.

Flags and Effects

The on_choice: block lets you set flags when a choice is made. Flags persist across encounters and can influence dialogue hints, branch conditions, and reveal conditions. Flags are case-insensitive.

Validation

Before running, the campaign is validated for: reachability (all encounters reachable from start), termination (all paths can reach __END__), dangling branches, undefined NPC/object references, and unknown choice keys in on_choice.

Cascade System

When an affordance fires (e.g., a sword slash), it may need to read from one entity and write to another. CascadeSpec defines these cross-entity effects in three phases:

1. Reads

Gather values from entity sensors. Each read has a ref path (e.g., wielder.strength.modifier) and an optional role name for use in expressions.

2. Writes

Apply changes to entity sensors. Supports absolute value:, additive delta:, or computed expr: (referencing read values).

3. Side Effects

Same mechanics as writes but semantically separate. Used for secondary consequences (e.g., alerting nearby NPCs, triggering environmental changes).

Cascade Example: Sword Slash cascade: reads: - ref: wielder.strength.modifier role: damage_bonus - ref: self.sharpness role: sharpness writes: - ref: target.hp expr: "-(roll + damage_bonus)" # computed from reads - ref: self.durability delta: -0.05 # sword degrades side_effects: - ref: target.alertness value: 1.0 # target is now alert

Roles in ref paths (self, wielder, target) are resolved at execution time by the CascadeResolver, which maps role names to actual Entity objects based on context.

Visibility System

Entity sensors and details have three visibility levels:

  • visible — Always shown in scene prompts and tool output
  • hidden — Never shown to the AUT (internal state only)
  • contextual — Hidden until a reveal_when condition is met
Contextual Reveal Example metadata: visibility: poison_resistance: contextual reveal_when: poison_resistance: ref: pc.social.rel_guard.trust op: ">=" value: 0.7 # revealed when trust is high enough

After each choice, the DM evaluates all contextual reveal conditions across all entities. When a condition passes, the item becomes permanently visible. This lets campaigns model information the AUT must earn through social interaction or exploration — testing whether the AUT uses newly revealed information is a strong signal for memory and reasoning quality.

Future: Generative DM

Not Yet Implemented

The --dm flag with a goal string (e.g., maxim --dm "run a heist scenario") is planned but not yet built. It would use an architect persona to generate campaign YAML on the fly from a goal string, then hand off to the existing DM runtime for execution.

The generative DM would combine the structured testing benefits of hand-authored campaigns (expectations, dice, branches) with the flexibility of goal-driven generation. The architect would produce valid campaign YAML — validated by the same reachability/termination checks — and the DM runtime would execute it unchanged. This is blocked on Agent Mesh Phase 2 (the architect needs to be a mesh agent) and a DM Spike to validate the approach.

Architecture

Campaign YAML → load_campaign() → validate_campaign() → CampaignDef | DMRuntime(campaign, bridge, llm_router) ←---+ | +→ SceneState (entity tool register/deregister) +→ ChooseTool (dynamic per-encounter choices) +→ CascadeResolver (cross-entity reads/writes) +→ bridge.send_and_wait() → AUT processes stimulus +→ classify_choice() → follow branch → loop | check_expectations(hippo, nac, scn, pain_bus) | get_rollup() → report.json
ModulePurpose
simulation/dm_schema.pyDataclasses, YAML loader, validator, dice roller, CascadeSpec, RevealCondition
simulation/dm_runtime.pyDMRuntime state machine, SceneState, CascadeResolver, choice classification
simulation/tools_dm.pyChooseTool (dynamic per-encounter tool with fuzzy matching)
scenarios/campaigns/*.yamlCampaign definitions (4 shipped)