Signature
← Back to Overview

MAXIM

Roadmap

Development Status, Priorities, and Research Directions

Maxim is built in waves. Each wave stabilizes before the next begins, and every initiative has clear dependencies. This page tracks what's shipped, what's next, and where the project is heading long-term.

Current Status

Every active initiative and its current state at a glance. Status badges reflect the most recent milestone reached.

Initiative Status Notes
Agent Mesh Done Complete (Phases Pre-7). Identity, protocol, transport, admission control, knowledge sharing, task delegation, distributed planning, SCN temporal coordination. mDNS + InferenceRouter deferred.
Generative Campaigns Done All stages shipped: narrative arcs, two-call narrator, planner integration, bridge-and-compress, ask_user tool, benchmark tiers, CLI simplification. 71 tests.
Embodiment Core Done All software phases shipped: SEM protocol, Cerebellum forward models, motor programs + engrams, composed failures, virtual entities. 164 tests. Hardware adapter deferred to future.
Simulation Benchmark Done All phases (0-6). Multi-model comparative testing, narrative transcriber, write-paper, Tier 3 hooks. maxim --sim benchmark --models X,Y --campaign Z
Python API Done Verb-based interface (run, imagine, connect, diagnose, observe, configure). Package: pymaxim.
Lane Tier Architecture Done FunctionRouter with tier routing (large/medium/small), fallback chains, auto-detection from hardware. Legacy lane names aliased.
Research Protocol Done Complete: mesh primitives, research tools, Writer + Reviewer, dual-LLM. maxim --sim research
Multi-LLM Scaling Done Complete. LeaderProxy, admission control, LaneMetrics, heartbeat, remote update.
Tool Refactoring Done All phases: say, think, examine, introspection tools, alias map, usage tracking, proactive tool list
Introspection API Done All phases. Observer (renamed from AUTIntrospector) + standalone run_campaign() shipped
Docker Sandbox Done Phase A (TmpdirSandbox + pain triggers) + Phase B (DockerSandbox + ContainerRunner + image catalog + unprivileged user) both shipped
Bio-System Wiring Hardening Done All phases shipped. Percept abstraction (SensoryModality, SensoryTag, SensoryGate), pipeline correctness, energy→NAc metabolic cost learning, decision_rationale provenance. Archived.
Mode System Refactor Done ~1,800 LOC removed. Strategies, exploration policy, and LiveModeIntent deleted. Sleep is now a tool. Skills module folded into Cerebellum. Dead runtime modules cleaned up.
DM MVP Done All 3 slices shipped: dm_schema.py, dm_runtime.py, tools_dm.py. 7 campaigns (heist, poisoned_crown, arena, darkened_cavern, kings_duel, neon_gauntlet, broken_database). ChooseTool + alias system, bio-system expectations checker, mid-campaign entity swaps.
Foundational Buildout Done Phases 0-12a: package hygiene, SEM component registry, encounter library, agent factory + pool, party DM mode, hippocampus recall, interactive runtime, generative architect, API expansion, 10 cloud providers, store protocols, security hardening. ~16K LOC, 239 tests.
API Surface Hardening Done All phases: wired stub API verbs, fixed research protocol, error handling on user-facing paths, integration tests, README overhaul. ~75 new tests.
Module Compartmentalization Done 5 god-modules decomposed: agent_loop, orchestrator, cli, router, lane_backends. 7 new focused files extracted (~1,120 LOC moved). 125 new tests. All import paths preserved.
Foundations F0.1–F0.8 Done NAc save/load, PerceptContext typed schema, agent_id threading, PerceptTraceBuffer, tier enforcement, SensoryTag population, Percept factory consolidation. Prereqs for substrate work.
Reaction Abstraction Ph1–4 Done Percept/Reaction dual-surface architecture. ReactionBus with refractory periods, producer protocols, typed Percept factories, runtime unification. Phase 5 folded into substrate P2.
Simulator Upgrades S1–S4 Done FixtureDrivenOrchestrator (S1), LLMBackend Protocol + MockLLMBackend (S2), subprocess persistence harness (S3), deterministic seeding with --seed (S4). 72 tests.
Substrate P0 Pilot Done Baseline pinned at 78.5% collapse ([email protected]). 55 clusters, 155 sentences, 3 difficulty tiers. Fixtures validated in the 60–85% well-calibrated range. P1 sanity floor = 73.5%.
Substrate B1 + P1 Recognition Done LinguisticEncoder, EC pattern completion with centroid update, ATL modality-tagged nodes, PromptAssembler. 91.7% ± 2.9% collapse with paraphrase-mpnet-base-v2 @ 0.40. 100% persistence round-trip. 92.5pp degenerate gap.
Substrate P2 Reward Modulation Done NAc per-node reward bias, eligibility traces, EC threshold modulation, CausalLink.percept_refs. Real-embedding sweep at [email protected], reward 2.0: +56.0 pp target gain, 0.0 pp distractor drift, 94% monotone, 9-of-10 seeds. SEM pain cascade end-to-end verified.
Bio-Stack Unification (Waves 1–3) Done build_bio_stack(*, persistence_dir) canonical. Structural enforcement for PainBus, ReactionBus, MemoryHub, DefaultNetwork. Four production callers.
Valence Annotation (Stages 1–3) Done Reactions → Episode.valence → Edge.metadata["valence"] → spreading_activation(propagate_valence=True). retrieve_on_cue(include_valence=True) for affective retrieval.
SEM Learning Loop Done Cerebellum activation via BioStack, distribute_reward (ReactionBus → NAc reward bias → EC threshold adjustment), CerebellumModulator success reactions (positive valence, negativity bias), pain spike episode boundaries (salience_spike_rule). PoC: 11/11 + 13/13.
Behavioral Convergence Wiring Done Valence in PromptAssembler, observe_episode_event in agent loop, energy→Reaction bridge (hunger/fatigue/satiation), food/water/poison SEM specs. Experiment 2: 13/13 hypotheses confirmed.
Behavioral Convergence Experiments (All 3 Tiers) Done 41/41 hypotheses, 4 experiments, all 3 tiers PASS. Tier 1: substrate learns (Exp 1+2). Tier 2: LLM acts on learning (Exp 3, 10/10 vs 0/10). Tier 3: organic LLM learning (Exp 4, teal rate 0%→25%→100%, fresh control DIED). v0.3.0.
B4 Replanning (1.0 Gate) Done Failure diagnosis + prior-attempt retrieval + Jaccard metric + anti-repetition. Blind A/B: treatment 100% vs control 0%, Jaccard 0.894. 1.0 gate CLOSED. v0.5.0.
P6 Extinction + P8 Sleep Replay Done P6: Hebbian decay beats LRU (10 seeds). P8: sleep replay F1 improves vs control (10 seeds). v0.5.0.
F2 AgentFactory CLI Migration Done create_full_agent() composes bio-stack + executor + fear gate. CLI non-sim bootstrap replaced. v0.5.0.
Interactive Mode + Input Standardization Done Bidirectional interactive, raw terminal, request_interaction, NAc suppression, unified input handling. Scale: 20/20 seeds, p = 3.87e-6. v0.3.1–0.4.0.
Generalizable Embodiment (E0–E3) Done E0: sim affordance tools. E1: Asset Foundry (generate + validate + gauntlet + score). E2: foundry --llm wiring + entity context. E2.5: ComponentIndex two-layer discovery. E3: Auto-Curation CLI. v0.6–0.7.
Imagination System (I1–I3) Done I1+I2: Entity extraction, ComponentIndex lookup, DN arousal gate, EntityDesigner LLM, ephemeral registration, provenance tagging. I3: Scene-scoped tools with active cap + executor gate. v0.7.0.
Acting Coach (B3.1) Done Meta-prompt scaffolding: NAc caution annotations, pain anticipation, cerebellum forward-model predictions. Bio-modulated exploration directive. v0.7.0.
Agent Factory (F3–F5) Done Sim orchestrator, Reachy embodied runtime, and headless API all migrated to AgentFactory.create_full_agent(). v0.7.0.

What We Just Shipped

v0.7.0: Imagination system, default embodiment, Acting Coach, scene-scoped tools, Asset Foundry + Auto-Curation, ComponentIndex, AgentFactory F3–F5. The agent now has a body by default, imagines novel entities in real-time from narration text, and tools activate/deactivate per-scene. All 1.0 gates are closed. Earlier: generalizable embodiment + Asset Foundry (v0.6), B4 replanning (v0.5), behavioral convergence (v0.3).

Imagination System (I1+I2+I3) (v0.7)

Entity noun-phrase extraction from percepts, ComponentIndex two-layer lookup (alias + embedding), DN arousal gate, EntityDesigner LLM-driven design, ephemeral registration with provenance tagging (imagined=True), 50% confidence decay at session end. I3: scene-scoped tool activation with active cap + executor gate.

Acting Coach + Factory F3–F5 (v0.7)

B3.1 Acting Coach: meta-prompt scaffolding with NAc caution, pain anticipation, cerebellum predictions. F3–F5: sim orchestrator, Reachy, and headless API all migrated to AgentFactory.create_full_agent().

Asset Foundry + Auto-Curation (E1–E3) (v0.6–0.7)

E1: LLM-driven entity generation + validation + 3-encounter gauntlet + 4-dimension scoring. E2: foundry --llm wiring + entity context in AUT prompt. E2.5: ComponentIndex (alias hash + embedding cosine). E3: --auto-curate pre-sim coverage gap filling with dedup.

Default Sim Embodiment (E0) (v0.6)

bodies/base_humanoid loads by default in sim mode — 5 sensors, 8 affordances, 3 failure modes. --embodiment works with --sim. 10 integration tests.

B4 Replanning (v0.5 — 1.0 GATE CLOSED)

Failure diagnosis with prior-attempt retrieval via hippocampus episodes. Jaccard distance metric for structural novelty. Anti-repetition prompt constraint. Blind A/B validation: treatment (replanning) 100% vs control (no replanning) 0%, mean Jaccard 0.894. 48 tests. The replanning 1.0 gate is closed.

P6 Extinction + P8 Sleep Replay (v0.5)

P6: DependencyGraph.decay_edges() — multiplicative Hebbian decay with pruning. Beats LRU across 10 seeds. P8: memory/sleep_replay.py — offline consolidation. Episode ranking by NAc reward_bias + valence, replay with 1.5× consolidation multiplier. F1 improves vs no-replay control. Activates memory_consolidation_practice.md.

F2 AgentFactory Migration (v0.5)

AgentFactory.create_full_agent() composes build_bio_stack + build_executor + FearGatedExecutor. CLI non-sim bootstrap (~100 lines) replaced with one factory call. Z1 design: per-instance Executor built once. Sim/Reachy/API migrations remaining (F3-F5).

Interactive Mode + Input Standardization (v0.3.1–0.4)

Bidirectional interactive mode: raw terminal input, request_interaction agent-to-user prompting, set_scene dynamic headers, slash commands, NAc suppression. Unified input standardization across all sim modes. Scale validation: 20/20 seeds, p = 3.87e-6.

SEM Learning Loop & Valence Annotation

CerebellumModulator success/failure reactions → ReactionBus → hippocampus (Episode.valence, Edge.metadata["valence"]) + NAc (distribute_reward for reward bias + EC threshold adjustment). spreading_activation(propagate_valence=True) + retrieve_on_cue(include_valence=True). Pain spike episode boundaries via salience_spike_rule. Cerebellum activation through BioStack + build_executor.

Behavioral Convergence Wiring

Closes the gap between substrate learning and LLM decisions. Valence in PromptAssembler, observe_episode_event in agent loop, energy→Reaction bridge (hunger/fatigue/satiation from energy depletion), food/water/poison SEM specs. Experiment 2: 13/13 hypotheses confirmed — food +0.753, water +0.135, poison -0.495.

DM MVP

Bundled SEM characters with cascade DAG for narrative branching. ChooseTool + alias system for encounter choices. Bio-system expectations checker validates campaign results. 4 campaigns: heist, poisoned_crown, arena, darkened_cavern. maxim --sim scenarios/campaigns/heist_v1.yaml

Bio-System Wiring Hardening

Percept abstraction layer (SensoryModality, SensoryTag, SensoryGate), pipeline correctness fixes, energy→NAc metabolic cost learning, decision_rationale provenance field on Perception. 14/14 pipeline audit checks passing.

Mode System Refactor

~1,800 LOC removed. Strategies, exploration policy, and LiveModeIntent deleted. Sleep is now a tool. Skills module folded into Cerebellum motor programs. Dead runtime modules cleaned up (resilient.py, session.py, debug_status_server.py, monitor_registry.py).

Agent Mesh

Full mesh protocol through Phase Pre-7: AgentProfile identity, UMR naming, MeshMessage envelopes, LocalMessageBus, knowledge sharing between agents, task delegation, distributed planning, and SCN temporal coordination. mDNS + InferenceRouter deferred.

Generative Campaign Mode

LLM-driven narrative arcs (4 builtin + custom YAML), two-call narrator with AdaptivePlanner integration, bridge-and-compress for multi-arc continuation, ask_user tool, tiered benchmarks, --sim "goal" CLI simplification. 71 tests.

Lane Tier Architecture

FunctionRouter routes functions to capability tiers (large/medium/small) with fallback chains. Auto-detection from hardware VRAM. The legacy lane names (infer/review/record) were fully removed in v1.0.0.

Simulation Benchmark

Phases 0-6 complete: BenchmarkRunner, CLI (maxim --sim benchmark), 6 scenarios, narrative transcriber, write-paper pipeline, Tier 3 hooks for multi-model comparative testing.

Python API (pymaxim)

Verb-based interface: run, imagine, connect, diagnose, observe, configure. Lazy imports, structured return types. Published as pymaxim on PyPI.

Embodiment Core

SEM protocol (Sensor-Entity-Modulator), Cerebellum forward models, motor programs + engrams, composable failures, virtual entities. 164 tests. Hardware adapter deferred to future.

Multi-LLM Scaling

LeaderProxy with authentication + GPU metrics, admission control (concurrency caps + rate limiting), LaneMetrics per-tier counters, system heartbeat with stall detection, remote peer management (maxim peer update/restart/llm). Plan 4 mesh management surface: maxim peer list-nodes for live status, --node X drain|resume for graceful traffic shaping, init-mesh / add-node / remove-node for topology setup, all backed by ~/.config/maxim/mesh.yml (declarative) + ~/.maxim/util/drained_nodes.{role}.txt (mutable runtime state, filelock-serialized).

Research Protocol

Mesh primitives, research tools (record_experiment, query_experiments), Writer + Reviewer agents, dual-LLM orchestration. maxim --sim research

Next Up

2026-05-09 architectural pivot

Maxim is moving toward a parallel-mode architecture where the bio-substrate (NAc + EC + ATL + Hippocampus + Default Network + reflexes) drives action selection directly, with LLMs demoted to supporting roles (orchestrator, NPCs, optional AUT). The existing LLM-AUT mode remains the user-facing default; substrate-primary mode ships parallel as opt-in via --aut-mode substrate-primary. Phase −1 prototype shipped 2026-05-09NAc.recommend_action() can generate non-reflex actions from learned causal links + drive heuristics with no LLM proposal needed (11 unit tests pass). The federated Maxim Hivemind + Oasis layer (in plan, not yet implemented) will let multiple Maxims share distilled bio-substrate across instances. See Substrate-Primary Mode and Maxim Hivemind + Oasis.

The path forward: 1.0 ships with B5 substrate-primary harness (Phase −1 + Phase 0 + Hivemind shareability infrastructure, all behind experimental flag) alongside D1–D3 docs. 1.1 lands substrate-primary AUT mode + first hostable Maxim Oasis. 1.2 ships the full Hivemind P2P protocol.

1

B5 Substrate-Primary Harness (v1.0)

Phase −1 ✓ shipped (NAc action proposal + 11 tests). Phase 0 harness ✓ shipped: --aut-mode substrate-primary CLI flag + cradle-prelinguistic arc variant + motor-only AUT prompt + per-tick telemetry. Roy harness ✓ shipped (2026-05-10): R1 curriculum runner + R2 substrate_diff + R3 three-arm iteration runner + R4 idempotent log generator + R5 process-global invariants. G3 fail-fast LLM preflight ✓ shipped (2026-05-11, PR #235+#238): _MaximPeerBackend.health_check() probe with env-then-peer.yml resolution; aborts in ≤3s on unreachable leader. G4 cluster_id reward wire ✓ shipped (2026-05-11, PR #236+#237): closes the deferred Track 2 wire — substrate-primary tool outcomes now populate NAc._cluster_reward_bias, persist to aut_nac.json, surface in substrate_diff. Empirically validated: live Roy-0 run produced cluster_reward_bias_l2 = 2.4587 on A-vs-blank pairs (~11.6× blank-vs-blank noise floor). Roy iteration arc Roy-1a–Roy-4 ✓ shipped (2026-05-11 → 2026-05-13): six follow-up iterations reproducing the wire 6× on the same priming and localizing the behavioral-expression gap to LinguisticEncoder → EC alignment; Roy-4 (PR #246) cancelled the 1.1 Hebbian binding plan via a pre-registered cheap-gate experiment. 0.9.1 Wire-A ships as the operator-visible interim that surfaces the surviving tool-level signal at the LLM prompt regardless of encoder drift. 1.1+ reframe: roy_5_encoder_alignment_disambiguator.md (PR #247) replaces the cancelled binding plan with a diagnostic-first ladder; Stage 1 (Roy-5a cosine analysis on existing Roy-4 data, zero new sim runs) decodes the gap to one of three sub-hypotheses that scope the 1.1+ fix. Hivemind shareability infrastructure remains: portable substrate-snapshot bundle format + nac.merge() / ec.merge() Bayesian aggregation + provenance tags + identity-bearing concept detection + substrate domains + export/import CLI verbs.

2

D1–D3 Documentation Pass (v1.0, parallel to B5)

Agent memory transfer docs (D1), API/CLI surface review (D2), final docs pass (D3). Runs parallel to B5; 1.0 ships when both complete.

3

Substrate-Primary AUT Mode + First Maxim Oasis (v1.1)

Substrate-primary AUT mode lands as opt-in. Phase 0 validation runs (raw substrate, no Hivemind bootstrap). First hostable Maxim Oasis (~800 LOC); LLM-AUT users opt in to contribute via maxim contribute --to oasis://.... Direct Oasis-to-Oasis sync supported.

4

Maxim Hivemind P2P Protocol (v1.2)

Full peer-to-peer substrate-snapshot exchange (~600 LOC): peer discovery, conflict-resolution semantics (Bayesian confidence aggregation), poison-resistance defenses (multi-source consensus, domain curation, provenance blacklists). Substrate-primary Maxims pull bootstrap from Hivemind, contribute back as they learn.

Research Directions

These are speculative, long-term directions. None are scheduled—they represent where the architecture could go once the current engineering work stabilizes.

ATL Self-Extension through Mechanism Discovery

Can Maxim's Anterior Temporal Lobe discover new concept categories and relationship types on its own? Today the taxonomy is hand-coded. A self-extending ATL would let the semantic memory grow in ways its designers didn't anticipate.

Federated Embodiments

Multiple Maxim instances sharing memory and causal models across different physical bodies. A robot that learns to open a door could transfer that knowledge to a different robot with different actuators—adapting the motor plan while keeping the causal structure.

Cross-Agent Affordance Delegation

When one agent discovers an affordance it can't act on (e.g., "this door has a handle but I have no gripper"), it could delegate to an agent that can. This requires a shared affordance vocabulary and a trust model for delegation.

Distributed Embodiment Construction

Multiple agents collaboratively assembling a physical structure, each contributing sensors and actuators. The Agent Mesh provides the communication substrate; this research explores what shared representations are needed for coordinated physical action.

Uncertainty-as-Pain

Mapping epistemic uncertainty to the PainDetector system. High uncertainty about a prediction would register as discomfort, motivating the agent to gather more information before acting—a bio-inspired approach to active learning and cautious exploration.

Dependency Graph

All 1.0 gates are closed. Remaining work is polish and packaging.

Path to 1.0 v0.3.0 — 41/41 behavioral convergence (cross-session learning) v0.5.0 — B4 replanning (1.0 GATE CLOSED), P6+P8, F2 v0.6.0 — E0-E3 generalizable embodiment (1.0 GATE CLOSED) v0.7.0 — Imagination system (I1+I2+I3), Acting Coach, F3-F5, E2.5+E3 v0.8.0 SHIPPED P5 Stress Persistence (final 1.0 gate CLOSED) Cradle sensorimotor (3-layer sensation, drives, 7 stages) Affordance Concept Transfer (substrate-native cross-entity learning) Reflex system (innate body responses) Pre-deliberation (Layer 1 ThoughtGate + bio-enrichment) SCN oscillator (B2 anticipatory pre-activation) Roy harness foundation (R1 curriculum + R2 substrate_diff + R3 iteration runner + R4 idempotent log generator + R5 invariants) │ ├──> 2026-05-11 Roy-0 harness validated end-to-end on live leader ├──> 2026-05-11 G3 fail-fast LLM preflight probe (PR #235 + #238 peer.yml fallback) ├──> 2026-05-11 G4 substrate-primary cluster_id reward wire (PR #236 + #237) │ cluster_reward_bias_l2 = 2.4587 on Roy-0 A-vs-blank pairs │ ├──> 2026-05-11 → 2026-05-13 Roy iteration arc (Roy-1a/1b/2/2pc/2c) │ Cluster wire reproduces 6×; H1 confirmed: LinguisticEncoder → EC alignment is the block │ 0.9.1 Wire-A surfaces the surviving tool-level signal at the LLM prompt ├──> 2026-05-13 Roy-4 EC instrumentation + Hebbian sweep (PR #246) │ FAIL — cancels cross_modal_substrate_binding.md Stages 2-6 │ Zero priming↔test bound edges at every parameter sweep point ├──> 2026-05-13 roy_5_encoder_alignment_disambiguator.md opened (PR #247) │ Stage 1 (Roy-5a cosine analysis on existing Roy-4 data) queued; verdict scopes 1.1+ fix │ ├──> B3.2-B3.3 Acting Coach extensions ├──> B5 substrate-primary harness flagged (Phase −1 ✓, Phase 0 harness ✓, validation 1.1+) ├──> Hivemind shareability infra (export/import, merge(), provenance — 1.0 reservation) │ └──> v1.0 — All gates closed. D1-D3 docs + stale-tagging cleanup remain. Package + publish.

All architectural gates have been closed and v0.8.0 ships P5 (the final gate). The Roy harness shipped 2026-05-10 with G3 + G4 follow-ups landing 2026-05-11, giving 1.0 a working substrate-primary closed-loop validated end-to-end against a live leader. The 2026-05-13 Roy iteration arc (Roy-1a through Roy-4) extends the validation surface from "the wire fires" to "the wire's behavioral expression is gated at the encoder-alignment layer"; the 1.1 Hebbian binding plan was cancelled by Roy-4 in favor of a diagnostic-first reframe (roy_5_encoder_alignment_disambiguator.md). Remaining 1.0 work is D1–D3 docs + stale-tagging cleanup; 1.0 is a packaging milestone independent of the 1.1+ encoder-alignment research direction.

Contributing

Maxim is open source. Contributions are welcome—especially on items marked Not Started in the status table above.

Getting Started

  1. Clone the repo from github.com/dennys246/maxim
  2. Read CLAUDE.md in the project root—it covers architectural invariants, testing commands, and the module map
  3. Run maxim doctor to verify your environment
  4. Run the test suite: python -m pytest tests/ -x -q --ignore=tests/integration/test_memory_hub.py
  5. Pick an initiative from the status table and open an issue or PR

Before You Start

The project has strong architectural invariants (one-way memory tiers, separate EpisodicMemory instances, LLM access only through the router). Read the CLAUDE.md section on invariants before making structural changes. The bio-system class names (Hippocampus, ATL, NAc, SCN, EC, AngularGyrus) are intentional and should not be renamed.