Technical Deep Dive

The Dependency Graph

Maxim enforces a strict one-way dependency graph. This isn't a suggestion; the architecture prevents circular dependencies at the module level. Higher layers may call lower layers, never the reverse.

Architecture ┌──────────────────────────────────────────────────────┐ │ AGENTS │ │ Goal reasoning, intent generation │ │ (NO side effects) │ └──────────────────────┬───────────────────────────────┘ │ ┌──────────────────────▼───────────────────────────────┐ │ PLANNING │ │ Plan generation, goal decomposition │ │ (NO execution) │ └──────────────────────┬───────────────────────────────┘ │ ┌──────────────────────▼───────────────────────────────┐ │ DECISION ENGINE │ │ Action selection, arbitration │ │ (NO planning, memory mutation, or │ │ tool execution) │ └──────────────────────┬───────────────────────────────┘ │ ┌──────────────────────▼───────────────────────────────┐ │ RUNTIME │ │ Agentic orchestration loop │ └──────────────────────┬───────────────────────────────┘ │ ┌──────────────────────▼───────────────────────────────┐ │ EXECUTOR │ │ Tool invocation, motor control │ └──────────────────────┬───────────────────────────────┘ │ ┌──────────────────────▼───────────────────────────────┐ │ TOOLS / ENVIRONMENT │ │ Side effects (I/O, network) │ World observation │ └──────────────────────┬───────────────────────────────┘ │ ┌──────────────────────▼───────────────────────────────┐ │ STATE / MEMORY │ │ Single source of truth │ Storage & retrieval │ │ (NO decision making) │ └──────────────────────────────────────────────────────┘

Why this matters: when memory doesn't make decisions and agents don't have side effects, you can reason about each layer independently. A bug in planning can't corrupt memory. A bad tool can't bypass safety checks. The decision engine is the single chokepoint for all actions.

The Orchestration Loop

Everything runs through selfy.py, the main conscience of the system. The name isn't accidental: it's the self-model, the inner loop where perception, cognition, and action come together.

Maxim.run() — simplified def run(self): # 1. Start capture threads (video, audio) self._start_video_capture() self._start_audio_capture() # 2. Main observation loop while self.alive: # Get latest frame frame = self._latest_frame_queue.get(timeout=0.1) # Vision inference (YOLO) detections = self.detect(frame) # Novelty tracking novel, familiar = self.novelty_tracker.update(detections) # Voice command check transcript = self._check_audio() if transcript: self._handle_voice_command(transcript) # Mode-specific behavior if self.mode == "exploration": self._exploration_step(novel, familiar) elif self.mode == "live": self._agentic_step(detections, transcript) elif self.mode == "sleep": self._sleep_step() # Keyboard input (if interactive) self._handle_keyboard() self.current_epoch += 1 # 3. Cleanup self._shutdown()

The loop runs at whatever rate the vision system can sustain (typically 15-30 fps on the Reachy Mini's camera). Each iteration is an "epoch" in Maxim's parlance.

Threading Model

Maxim uses a carefully designed multi-threaded architecture. Each thread has a single responsibility and communicates through bounded queues:

Threading diagram Main Thread (observation loop) │ ├─ Video Capture Thread │ └─ Reads from RobotController.get_video_stream() │ Writes to: video_save_queue (bounded) │ latest_frame_queue (size 1, drops old) │ ├─ Video Writer Thread │ └─ Consumes video_save_queue │ Writes MP4 to data/videos/ │ ├─ Audio Capture Thread │ └─ Reads from RobotController.get_audio_stream() │ Writes to: audio_save_queue │ transcription_queue │ ├─ Audio Writer Thread │ └─ Consumes audio_save_queue │ Writes WAV to data/audio/ │ ├─ Transcription Process │ └─ Consumes audio chunks │ Runs Whisper inference │ Writes JSONL to data/transcript/ │ ├─ WorkerPool (typed lanes) │ ├─ infer lane (1 worker, GPU) — LLM inference │ ├─ review lane (1 worker, CPU) — evaluation │ └─ record lane (2 workers) — memory writes, I/O │ ├─ Hippocampus Capture Thread │ └─ Own FIFO queue (bounded, 100 items) │ Non-blocking capture_from_loop_async() │ Drops oldest on overflow │ ├─ EC NeuralEmbedder Thread │ └─ Async semantic embedding queue │ Triggered by hippocampus capture callbacks │ └─ Motor Executor Thread └─ Exclusive RobotController access for motor commands Prevents concurrent motor operations

Key Design Decision: Bounded Queues

The video save queue is bounded (blocks on backpressure), while the latest frame queue has size 1 and drops old frames. This means the observation loop always processes the freshest frame available, never falling behind. Video recording might skip frames under load, but real-time perception never stalls.

Async Worker Pool

The two biggest blocking operations in the 30Hz main loop — LLM inference and hippocampus memory writes — are now handled by three independent async systems that eliminate contention:

WorkerPool architecture WorkerPool (runtime/worker_pool.py) ├─ Lane: Named work category with own PriorityQueue + ThreadPoolExecutor │ Default lanes: │ infer — 1 worker, GPU-bound (LLM calls) │ review — 1 worker, CPU-bound (evaluation) │ record — 2 workers (memory writes, I/O) │ ├─ Job: Unit of work with priority, optional DependencySpec │ Lifecycle: PENDING → RUNNING → COMPLETED | FAILED | CANCELLED │ ├─ DependencyGate: Per-job blocker with two-phase prefetch │ prefetch_early — runs on submission (gather stable data) │ prefetch_late — runs after deps resolve (gather fresh data) │ ├─ JobRegistry: Thread-safe lifecycle tracker │ Uses threading.Event per job for cross-job waits │ GC'd completions still resolve dependency checks │ └─ GC Thread: Prunes completed jobs every 60s (TTL: 300s)

Passive Hippocampus

The hippocampus uses its own independent FIFO queue rather than the WorkerPool — a deliberate choice because captures don't need dependency gates and FIFO ordering is more appropriate than priority scheduling:

Async capture flow Main loop calls capture_from_loop_async() → Snapshots state immediately (prevents stale-reference bugs) → Creates immutable _CaptureRequest → Puts on bounded queue (max 100, drops oldest on overflow) → Returns immediately (non-blocking) Hippocampus worker thread: → Drains queue, calls capture_from_loop() per request → Acquires write lock, stores memory, builds index → Forms associative edges, fires capture callbacks → EC schedule_embedding() chains to NeuralEmbedder queue flush(timeout) blocks until queue is drained → Called before session-end consolidation

Three Independent Async Systems

The async architecture deliberately uses three separate systems rather than one monolithic pool:

WorkerPool — typed lanes with dependency gates for LLM inference
Hippocampus capture thread — own FIFO queue for memory writes
EC NeuralEmbedder — own async queue for semantic embedding, triggered by hippocampus callbacks

This separation means LLM inference, memory capture, and semantic embedding can all proceed concurrently without blocking the 30Hz observation loop.

The Tool System

Tools are the only way Maxim affects the world. Every side effect, from moving a motor to searching the internet, goes through a tool.

Tool base class class Tool(ABC): name: str # "move_head", "internet_search" description: str # For LLM context input_schema: dict[str, Any] # JSON Schema for parameters def run(self, **kwargs) -> ToolResult: self._validate_input(kwargs) # Schema check output = self.execute(**kwargs) # Subclass implements return ToolResult(success=True, output=output) @abstractmethod def execute(self, **kwargs) -> Any: """Perform the actual side effect."""

Available Tools

Category	Tool	What It Does
Robot Control	MoveTool	Move head to absolute pose (x, y, z, roll, pitch, yaw)
	FocusInterestsTool	Focus on objects of interest; optionally specify a target class to prioritize
	TrackTargetTool	Track and follow an object ("center" or "follow" strategy)
	NoveltyTrackTool	Query novelty/familiarity scores for detected objects
	MaximCommandTool	Send state machine commands (sleep, wake, shutdown)
Filesystem	ReadFileTool	Read files (path traversal blocked)
	WriteFileTool	Write files to sandbox directory only
	ExecuteFileTool	Execute Python scripts with timeout enforcement
	GlobTool	Pattern-based file search within allowed directories
	BashTool	Shell command execution with containment
Network	InternetSearchTool	DuckDuckGo search (returns title, URL, snippet)
	HttpFetchTool	Fetch and parse web pages (blocks localhost)
	InternetAccessTool	General internet access gating
Math	MathTool	Mathematical cognition — routes between IPS (compare, trend, anomaly) and Angular Gyrus (compute, analyze, matrix ops). Supports natural-language aliases (sqrt, factorial, squared, cubed)
Response	RespondTool	Send text response to user
Response	SpeakTool	Synthesize speech via TTS engine
Mode Control	ModeSwitchTool	Switch between operating modes (passive, active, singularity)
Mode Control	AutonomyLevelTool	Adjust autonomy level within the current mode
Live Intent	DefineLiveModeIntentTool	Define a new intent for live mode self-evolution
	ReviewLiveModeIntentTool	Review current live mode intent and progress
	RecordLiveIntentInsightTool	Record an insight relevant to the active intent
	RecordLiveOutcomeTool	Record an outcome observation for intent tracking
Communication	SendMessageTool	Send a message through the gateway
Communication	CallUserTool	Initiate a call to the user via gateway

Tool Invocation Flow

Pseudocode Agent proposes intent → Planner decomposes into tools → Decision Engine selects → FearAgent safety review → Executor calls tool.run(params) → ToolResult → Memory records action + outcome → NAc learns causal link

LLM Integration

Maxim runs LLMs locally using llama-cpp-python (GGUF format) for CPU + Metal GPU acceleration. No cloud API calls for inference.

Model Options

Model	Size	Context	Best For
SmolLM 1.7B	~1.1GB	4096	CPU-only, low RAM
Phi-3 Mini	~2.3GB	4096	Balanced performance
Mistral 7B	~4.4GB	4096	General agentic tasks
Qwen2 7B	~4.4GB	8192	Extended context
Llama 3 8B	~4.9GB	8192	Maximum capability

Per-Mode Response Sizing

Each operating mode gets different context windows and response limits, tuned for its cognitive demands:

Response configuration observe: max_tokens=128, context=512 (fast, minimal) sleep: max_tokens=64, context=256 (barely conscious) exploration: max_tokens=256, context=1024 (medium processing) live: max_tokens=512, context=2048 (full interaction) reflection: max_tokens=1024, context=3072 (deep introspection)

Agent Types

LLMAgent

Raw text completion. Single prompt in, text out. Used for classification and simple reasoning.

ChatLLMAgent

Multi-turn conversation with context retention. Used for interactive dialogue with users.

Hardware Abstraction

The RobotController abstract base class defines everything Maxim needs from a robot. Concrete implementations handle the actual hardware:

RobotController interface (simplified) class RobotController(ABC): # Connection def connect(self, timeout: float = 30.0) -> bool def disconnect(self) -> None def reconnect(self, timeout: float, max_attempts: int) -> bool # Motion def goto_target(self, target: MotionTarget) -> bool def look_at_pixel(self, target: PixelTarget) -> bool def get_current_pose(self) -> dict[str, float] def center_vision(self, duration: float) -> bool # Lifecycle def wake_up(self) -> bool def goto_sleep(self) -> bool # Recording def start_recording(self) -> bool def stop_recording(self) -> bool # Streams def get_video_stream(self) -> VideoStream | None def get_audio_stream(self) -> AudioStream | None

MotionTarget — the lingua franca of movement @dataclass class MotionTarget: head_roll: float | None = None # Radians head_pitch: float | None = None head_yaw: float | None = None body_yaw: float | None = None antenna_left: float | None = None antenna_right: float | None = None duration: float = 1.0 # Seconds method: str = "minimum_jerk" # Smooth trajectory

The ReachyMiniController connects via Zenoh peer discovery on the local network, streaming video through GStreamer and audio through WebRTC. The SimulatedController provides a virtual 640x480 stream for testing without hardware.

Multi-Robot Support

The RobotRegistry (singleton) allows connecting multiple robots simultaneously:

Example registry = RobotRegistry() registry.register_controller_type("reachy_mini", ReachyMiniController) registry.register_controller_type("simulated", SimulatedController) primary = registry.connect_robot("primary", "reachy_mini", set_primary=True) secondary = registry.connect_robot("secondary", "simulated")

The Bridge System

Bridges are the connective tissue between biological memory systems and the rest of the architecture. Each bridge creates bidirectional learning between two or more components.

Why Bridges?

The strict layered architecture means memory can't directly influence decisions, and decisions can't directly write to memory. Bridges provide the controlled channels for information to flow between layers, maintaining architectural purity while enabling integrated learning.

SpatialMemoryBridge

Hippocampus + EC ↔ SpatialMap + AttentionNetwork

Stores multi-session object location priors. "The mug was on the counter 3 out of 4 times" translates to an attention boost for the counter region when searching for mugs.

Example boosts = spatial_bridge.boost_attention_for_goal("find mug") # Returns: [(position=(320, 180), weight=0.82), ...] # Counter region gets priority based on prior success

PainCircuitBridge

PainDetector ↔ NAc ↔ FearAgent

Two modes of harm prevention:

Mode	Latency	Mechanism	When
Predictive	Zero (pre-execution)	Physics-based velocity/limit analysis	Before every motor command
Learned	After first occurrence	NAc pattern matching from past pain events	After experiencing pain once

PlanHistoryBridge

Hippocampus ↔ NAc

Retrieves successful plan templates from memory. If "find mug" succeeded with the tool sequence [look_at_counter, track_object, approach], that template is offered for similar future goals.

EscalationLearningBridge

Hippocampus + SCN ↔ NAc

Learns when to ask a human for help. If confidence in an action drops below a learned threshold (different per goal type, per time of day), Maxim escalates rather than acting autonomously.

EnergyCircuitBridge

Energy Tracking ↔ NAc

Reports resource costs as valence signals. Expensive LLM calls get negative valence. Cheap local inferences get positive valence. Over time, the NAc learns to predict energy costs and factor them into planning.

SalienceMemoryBridge

Hippocampus + EC ↔ SalienceNetwork

Enriches real-time salience scores with long-term interaction history. Objects you've had positive experiences with become more salient. Objects associated with failures get suppressed.

Bridge Coordination: MemoryHub

MemoryHub — central coordinator hub = MemoryHub( hippocampus=hippo, spatial_map=spatial_map, nac=nac, fear_agent=fear_agent, pain_detector=pain_detector ) # Session lifecycle hub.on_session_start() # Load all priors hub.record_action(sig, tool, params) # During execution hub.record_outcome(sig, success, v) # After execution hub.consolidate_during_sleep() # Compress & clean

Persistence & Checkpointing

Maxim persists learned state across sessions. Every biological system saves its learned parameters:

Component	File	What's Saved
Hippocampus	data/util/hippocampus.json	All episodic memories + associative indices
NAc	data/util/nac_state.json	Learned causal links (action → outcome)
SCN	data/util/scn_state.json	Temporal bin distributions
FocusLearner	data/util/focus_learner.json	Movement gain values (Rescorla-Wagner)
WorkspaceBounds	data/util/learned_bounds.json	Discovered reachable space limits
FearCircuit	data/util/fear_learning.json	Learned aversive action patterns
PainDetector	data/util/pain_detector.json	Pain threshold adaptations
Thresholds	data/util/adaptive_thresholds.json	Learned escalation/pain thresholds

Goal Tree Checkpointing

Before risky operations, Maxim checkpoints its entire goal tree so it can recover if something goes wrong:

Example checkpoint_id = persistence.checkpoint( tree=goal_tree, config=config, reason="pre_risky_op" ) # If things go wrong... tree, config, budgets = persistence.recover(checkpoint_id)

Memory Consolidation (Sleep)

Step	Mechanism	Effect
1. Score	Access frequency + recency	Rank memories by importance
2. Compress	EpisodicMemory → CompressedMemory	~2.5KB → ~200 bytes
3. Prune	Remove unaccessed (>7 days)	Cap total memory footprint
4. Protect	Exempt high-value memories	User interactions, successes preserved
5. Reindex	SCN-aware rebalancing	Maintain temporal coverage

Energy Tracking

Maxim tracks resource expenditure across six domains, creating a unified cost model for decision-making:

Energy types and model multipliers Energy Domains: LLM_TOKENS # Input + output token consumption LLM_LATENCY # Wall-clock time waiting for inference MOTOR_COMMAND # Movement execution cost VISION_INFERENCE # YOLO detection cost AUDIO_PROCESSING # Whisper transcription cost ATTENTION # Cognitive focus allocation Model Cost Multipliers: local: 0.2 (cheapest) ollama: 0.3 gpt-4o-mini: 0.4 claude-3-haiku: 0.5 claude-haiku-4-5: 0.6 claude-3-sonnet: 1.0 (baseline) claude-sonnet-4-5: 1.2 gpt-4o: 1.5 gpt-4-turbo: 1.8 claude-3-opus: 2.0 claude-opus-4-5: 2.5 (most expensive)

Energy signals flow to the NAc. High costs produce negative valence. Low costs produce positive valence. Over time, Maxim learns which strategies are efficient and which are wasteful, without being explicitly programmed with cost tables.

The Full Picture

Every component described here, from the layered architecture to the bridge system to energy tracking, serves a single purpose: enabling an embodied agent that learns from experience while maintaining safety guarantees.

The biological metaphors aren't decoration. They're engineering decisions. Hippocampal indexing is faster than SQL for the access patterns Maxim needs. Rescorla-Wagner learning converges more smoothly than gradient descent for small-sample motor adaptation. Pain circuits provide safety guarantees that policy-only approaches can't match.

The result is a system where you can plug in a different robot, swap the LLM, or add new tools, and the cognitive architecture adapts. Because that's what brains do.

Contents