Signature
← Back to Overview

MAXIM

Technical Deep Dive

Architecture, Threading, Bridges, and the Orchestration Loop

The Dependency Graph

Maxim enforces a strict one-way dependency graph. This isn't a suggestion; the architecture prevents circular dependencies at the module level. Higher layers may call lower layers, never the reverse.

Architecture ┌──────────────────────────────────────────────────────┐ │ AGENTS │ │ Goal reasoning, intent generation │ │ (NO side effects) │ └──────────────────────┬───────────────────────────────┘ │ ┌──────────────────────▼───────────────────────────────┐ │ PLANNING │ │ Plan generation, goal decomposition │ │ (NO execution) │ └──────────────────────┬───────────────────────────────┘ │ ┌──────────────────────▼───────────────────────────────┐ │ DECISION ENGINE │ │ Action selection, arbitration │ │ (NO planning, memory mutation, or │ │ tool execution) │ └──────────────────────┬───────────────────────────────┘ │ ┌──────────────────────▼───────────────────────────────┐ │ RUNTIME │ │ Agentic orchestration loop │ └──────────────────────┬───────────────────────────────┘ │ ┌──────────────────────▼───────────────────────────────┐ │ EXECUTOR │ │ Tool invocation, motor control │ └──────────────────────┬───────────────────────────────┘ │ ┌──────────────────────▼───────────────────────────────┐ │ TOOLS / ENVIRONMENT │ │ Side effects (I/O, network) │ World observation │ └──────────────────────┬───────────────────────────────┘ │ ┌──────────────────────▼───────────────────────────────┐ │ STATE / MEMORY │ │ Single source of truth │ Storage & retrieval │ │ (NO decision making) │ └──────────────────────────────────────────────────────┘

Why this matters: when memory doesn't make decisions and agents don't have side effects, you can reason about each layer independently. A bug in planning can't corrupt memory. A bad tool can't bypass safety checks. The decision engine is the single chokepoint for all actions.

The Orchestration Loop

Everything runs through selfy.py, the main conscience of the system. The name isn't accidental: it's the self-model, the inner loop where perception, cognition, and action come together.

Maxim.run() — simplified def run(self): # 1. Start capture threads (video, audio) self._start_video_capture() self._start_audio_capture() # 2. Main observation loop while self.alive: # Get latest frame frame = self._latest_frame_queue.get(timeout=0.1) # Vision inference (YOLO) detections = self.detect(frame) # Novelty tracking novel, familiar = self.novelty_tracker.update(detections) # Voice command check transcript = self._check_audio() if transcript: self._handle_voice_command(transcript) # Mode-specific behavior if self.mode == "exploration": self._exploration_step(novel, familiar) elif self.mode == "live": self._agentic_step(detections, transcript) elif self.mode == "sleep": self._sleep_step() # Keyboard input (if interactive) self._handle_keyboard() self.current_epoch += 1 # 3. Cleanup self._shutdown()

The loop runs at whatever rate the vision system can sustain (typically 15-30 fps on the Reachy Mini's camera). Each iteration is an "epoch" in Maxim's parlance.

Threading Model

Maxim uses a carefully designed multi-threaded architecture. Each thread has a single responsibility and communicates through bounded queues:

Threading diagram Main Thread (observation loop) │ ├─ Video Capture Thread │ └─ Reads from RobotController.get_video_stream() │ Writes to: video_save_queue (bounded) │ latest_frame_queue (size 1, drops old) │ ├─ Video Writer Thread │ └─ Consumes video_save_queue │ Writes MP4 to data/videos/ │ ├─ Audio Capture Thread │ └─ Reads from RobotController.get_audio_stream() │ Writes to: audio_save_queue │ transcription_queue │ ├─ Audio Writer Thread │ └─ Consumes audio_save_queue │ Writes WAV to data/audio/ │ ├─ Transcription Process │ └─ Consumes audio chunks │ Runs Whisper inference │ Writes JSONL to data/transcript/ │ ├─ WorkerPool (typed lanes) │ ├─ infer lane (1 worker, GPU) — LLM inference │ ├─ review lane (1 worker, CPU) — evaluation │ └─ record lane (2 workers) — memory writes, I/O │ ├─ Hippocampus Capture Thread │ └─ Own FIFO queue (bounded, 100 items) │ Non-blocking capture_from_loop_async() │ Drops oldest on overflow │ ├─ EC NeuralEmbedder Thread │ └─ Async semantic embedding queue │ Triggered by hippocampus capture callbacks │ └─ Motor Executor Thread └─ Exclusive RobotController access for motor commands Prevents concurrent motor operations

Key Design Decision: Bounded Queues

The video save queue is bounded (blocks on backpressure), while the latest frame queue has size 1 and drops old frames. This means the observation loop always processes the freshest frame available, never falling behind. Video recording might skip frames under load, but real-time perception never stalls.

Async Worker Pool

The two biggest blocking operations in the 30Hz main loop — LLM inference and hippocampus memory writes — are now handled by three independent async systems that eliminate contention:

WorkerPool architecture WorkerPool (runtime/worker_pool.py) ├─ Lane: Named work category with own PriorityQueue + ThreadPoolExecutor │ Default lanes: │ infer — 1 worker, GPU-bound (LLM calls) │ review — 1 worker, CPU-bound (evaluation) │ record — 2 workers (memory writes, I/O) │ ├─ Job: Unit of work with priority, optional DependencySpec │ Lifecycle: PENDING → RUNNING → COMPLETED | FAILED | CANCELLED │ ├─ DependencyGate: Per-job blocker with two-phase prefetch │ prefetch_early — runs on submission (gather stable data) │ prefetch_late — runs after deps resolve (gather fresh data) │ ├─ JobRegistry: Thread-safe lifecycle tracker │ Uses threading.Event per job for cross-job waits │ GC'd completions still resolve dependency checks │ └─ GC Thread: Prunes completed jobs every 60s (TTL: 300s)

Passive Hippocampus

The hippocampus uses its own independent FIFO queue rather than the WorkerPool — a deliberate choice because captures don't need dependency gates and FIFO ordering is more appropriate than priority scheduling:

Async capture flow Main loop calls capture_from_loop_async() → Snapshots state immediately (prevents stale-reference bugs) → Creates immutable _CaptureRequest → Puts on bounded queue (max 100, drops oldest on overflow) → Returns immediately (non-blocking) Hippocampus worker thread: → Drains queue, calls capture_from_loop() per request → Acquires write lock, stores memory, builds index → Forms associative edges, fires capture callbacks → EC schedule_embedding() chains to NeuralEmbedder queue flush(timeout) blocks until queue is drained → Called before session-end consolidation

Three Independent Async Systems

The async architecture deliberately uses three separate systems rather than one monolithic pool:

  • WorkerPool — typed lanes with dependency gates for LLM inference
  • Hippocampus capture thread — own FIFO queue for memory writes
  • EC NeuralEmbedder — own async queue for semantic embedding, triggered by hippocampus callbacks

This separation means LLM inference, memory capture, and semantic embedding can all proceed concurrently without blocking the 30Hz observation loop.

The Tool System

Tools are the only way Maxim affects the world. Every side effect, from moving a motor to searching the internet, goes through a tool.

Tool base class class Tool(ABC): name: str # "move_head", "internet_search" description: str # For LLM context input_schema: dict[str, Any] # JSON Schema for parameters def run(self, **kwargs) -> ToolResult: self._validate_input(kwargs) # Schema check output = self.execute(**kwargs) # Subclass implements return ToolResult(success=True, output=output) @abstractmethod def execute(self, **kwargs) -> Any: """Perform the actual side effect."""

Available Tools

Category Tool What It Does
Robot Control MoveTool Move head to absolute pose (x, y, z, roll, pitch, yaw)
FocusInterestsTool Focus on objects of interest; optionally specify a target class to prioritize
TrackTargetTool Track and follow an object ("center" or "follow" strategy)
NoveltyTrackTool Query novelty/familiarity scores for detected objects
MaximCommandTool Send state machine commands (sleep, wake, shutdown)
Filesystem ReadFileTool Read files (path traversal blocked)
WriteFileTool Write files to sandbox directory only
ExecuteFileTool Execute Python scripts with timeout enforcement
GlobTool Pattern-based file search within allowed directories
BashTool Shell command execution with containment
Network InternetSearchTool DuckDuckGo search (returns title, URL, snippet)
HttpFetchTool Fetch and parse web pages (blocks localhost)
InternetAccessTool General internet access gating
Math MathTool Mathematical cognition — routes between IPS (compare, trend, anomaly) and Angular Gyrus (compute, analyze, matrix ops). Supports natural-language aliases (sqrt, factorial, squared, cubed)
Response RespondTool Send text response to user
SpeakTool Synthesize speech via TTS engine
Mode Control ModeSwitchTool Switch between operating modes (passive, active, singularity)
AutonomyLevelTool Adjust autonomy level within the current mode
Live Intent DefineLiveModeIntentTool Define a new intent for live mode self-evolution
ReviewLiveModeIntentTool Review current live mode intent and progress
RecordLiveIntentInsightTool Record an insight relevant to the active intent
RecordLiveOutcomeTool Record an outcome observation for intent tracking
Communication SendMessageTool Send a message through the gateway
CallUserTool Initiate a call to the user via gateway

Tool Invocation Flow

Pseudocode Agent proposes intent → Planner decomposes into tools → Decision Engine selects → FearAgent safety review → Executor calls tool.run(params) → ToolResult → Memory records action + outcome → NAc learns causal link

LLM Integration

Maxim runs LLMs locally using llama-cpp-python (GGUF format) for CPU + Metal GPU acceleration. No cloud API calls for inference.

Model Options

Model Size Context Best For
SmolLM 1.7B ~1.1GB 4096 CPU-only, low RAM
Phi-3 Mini ~2.3GB 4096 Balanced performance
Mistral 7B ~4.4GB 4096 General agentic tasks
Qwen2 7B ~4.4GB 8192 Extended context
Llama 3 8B ~4.9GB 8192 Maximum capability

Per-Mode Response Sizing

Each operating mode gets different context windows and response limits, tuned for its cognitive demands:

Response configuration observe: max_tokens=128, context=512 (fast, minimal) sleep: max_tokens=64, context=256 (barely conscious) exploration: max_tokens=256, context=1024 (medium processing) live: max_tokens=512, context=2048 (full interaction) reflection: max_tokens=1024, context=3072 (deep introspection)

Agent Types

LLMAgent

Raw text completion. Single prompt in, text out. Used for classification and simple reasoning.

ChatLLMAgent

Multi-turn conversation with context retention. Used for interactive dialogue with users.

Hardware Abstraction

The RobotController abstract base class defines everything Maxim needs from a robot. Concrete implementations handle the actual hardware:

RobotController interface (simplified) class RobotController(ABC): # Connection def connect(self, timeout: float = 30.0) -> bool def disconnect(self) -> None def reconnect(self, timeout: float, max_attempts: int) -> bool # Motion def goto_target(self, target: MotionTarget) -> bool def look_at_pixel(self, target: PixelTarget) -> bool def get_current_pose(self) -> dict[str, float] def center_vision(self, duration: float) -> bool # Lifecycle def wake_up(self) -> bool def goto_sleep(self) -> bool # Recording def start_recording(self) -> bool def stop_recording(self) -> bool # Streams def get_video_stream(self) -> VideoStream | None def get_audio_stream(self) -> AudioStream | None
MotionTarget — the lingua franca of movement @dataclass class MotionTarget: head_roll: float | None = None # Radians head_pitch: float | None = None head_yaw: float | None = None body_yaw: float | None = None antenna_left: float | None = None antenna_right: float | None = None duration: float = 1.0 # Seconds method: str = "minimum_jerk" # Smooth trajectory

The ReachyMiniController connects via Zenoh peer discovery on the local network, streaming video through GStreamer and audio through WebRTC. The SimulatedController provides a virtual 640x480 stream for testing without hardware.

Multi-Robot Support

The RobotRegistry (singleton) allows connecting multiple robots simultaneously:

Example registry = RobotRegistry() registry.register_controller_type("reachy_mini", ReachyMiniController) registry.register_controller_type("simulated", SimulatedController) primary = registry.connect_robot("primary", "reachy_mini", set_primary=True) secondary = registry.connect_robot("secondary", "simulated")

The Bridge System

Bridges are the connective tissue between biological memory systems and the rest of the architecture. Each bridge creates bidirectional learning between two or more components.

Why Bridges?

The strict layered architecture means memory can't directly influence decisions, and decisions can't directly write to memory. Bridges provide the controlled channels for information to flow between layers, maintaining architectural purity while enabling integrated learning.

SpatialMemoryBridge

Hippocampus + EC ↔ SpatialMap + AttentionNetwork

Stores multi-session object location priors. "The mug was on the counter 3 out of 4 times" translates to an attention boost for the counter region when searching for mugs.

Example boosts = spatial_bridge.boost_attention_for_goal("find mug") # Returns: [(position=(320, 180), weight=0.82), ...] # Counter region gets priority based on prior success

PainCircuitBridge

PainDetector ↔ NAc ↔ FearAgent

Two modes of harm prevention:

Mode Latency Mechanism When
Predictive Zero (pre-execution) Physics-based velocity/limit analysis Before every motor command
Learned After first occurrence NAc pattern matching from past pain events After experiencing pain once

PlanHistoryBridge

Hippocampus ↔ NAc

Retrieves successful plan templates from memory. If "find mug" succeeded with the tool sequence [look_at_counter, track_object, approach], that template is offered for similar future goals.

EscalationLearningBridge

Hippocampus + SCN ↔ NAc

Learns when to ask a human for help. If confidence in an action drops below a learned threshold (different per goal type, per time of day), Maxim escalates rather than acting autonomously.

EnergyCircuitBridge

Energy Tracking ↔ NAc

Reports resource costs as valence signals. Expensive LLM calls get negative valence. Cheap local inferences get positive valence. Over time, the NAc learns to predict energy costs and factor them into planning.

SalienceMemoryBridge

Hippocampus + EC ↔ SalienceNetwork

Enriches real-time salience scores with long-term interaction history. Objects you've had positive experiences with become more salient. Objects associated with failures get suppressed.

Bridge Coordination: MemoryHub

MemoryHub — central coordinator hub = MemoryHub( hippocampus=hippo, spatial_map=spatial_map, nac=nac, fear_agent=fear_agent, pain_detector=pain_detector ) # Session lifecycle hub.on_session_start() # Load all priors hub.record_action(sig, tool, params) # During execution hub.record_outcome(sig, success, v) # After execution hub.consolidate_during_sleep() # Compress & clean

Persistence & Checkpointing

Maxim persists learned state across sessions. Every biological system saves its learned parameters:

Component File What's Saved
Hippocampus data/util/hippocampus.json All episodic memories + associative indices
NAc data/util/nac_state.json Learned causal links (action → outcome)
SCN data/util/scn_state.json Temporal bin distributions
FocusLearner data/util/focus_learner.json Movement gain values (Rescorla-Wagner)
WorkspaceBounds data/util/learned_bounds.json Discovered reachable space limits
FearCircuit data/util/fear_learning.json Learned aversive action patterns
PainDetector data/util/pain_detector.json Pain threshold adaptations
Thresholds data/util/adaptive_thresholds.json Learned escalation/pain thresholds

Goal Tree Checkpointing

Before risky operations, Maxim checkpoints its entire goal tree so it can recover if something goes wrong:

Example checkpoint_id = persistence.checkpoint( tree=goal_tree, config=config, reason="pre_risky_op" ) # If things go wrong... tree, config, budgets = persistence.recover(checkpoint_id)

Memory Consolidation (Sleep)

Step Mechanism Effect
1. Score Access frequency + recency Rank memories by importance
2. Compress EpisodicMemory → CompressedMemory ~2.5KB → ~200 bytes
3. Prune Remove unaccessed (>7 days) Cap total memory footprint
4. Protect Exempt high-value memories User interactions, successes preserved
5. Reindex SCN-aware rebalancing Maintain temporal coverage

Energy Tracking

Maxim tracks resource expenditure across six domains, creating a unified cost model for decision-making:

Energy types and model multipliers Energy Domains: LLM_TOKENS # Input + output token consumption LLM_LATENCY # Wall-clock time waiting for inference MOTOR_COMMAND # Movement execution cost VISION_INFERENCE # YOLO detection cost AUDIO_PROCESSING # Whisper transcription cost ATTENTION # Cognitive focus allocation Model Cost Multipliers: local: 0.2 (cheapest) ollama: 0.3 gpt-4o-mini: 0.4 claude-3-haiku: 0.5 claude-haiku-4-5: 0.6 claude-3-sonnet: 1.0 (baseline) claude-sonnet-4-5: 1.2 gpt-4o: 1.5 gpt-4-turbo: 1.8 claude-3-opus: 2.0 claude-opus-4-5: 2.5 (most expensive)

Energy signals flow to the NAc. High costs produce negative valence. Low costs produce positive valence. Over time, Maxim learns which strategies are efficient and which are wasteful, without being explicitly programmed with cost tables.

The Full Picture

Every component described here, from the layered architecture to the bridge system to energy tracking, serves a single purpose: enabling an embodied agent that learns from experience while maintaining safety guarantees.

The biological metaphors aren't decoration. They're engineering decisions. Hippocampal indexing is faster than SQL for the access patterns Maxim needs. Rescorla-Wagner learning converges more smoothly than gradient descent for small-sample motor adaptation. Pain circuits provide safety guarantees that policy-only approaches can't match.

The result is a system where you can plug in a different robot, swap the LLM, or add new tools, and the cognitive architecture adapts. Because that's what brains do.