MAXIM
Technical Deep Dive
Architecture, Threading, Bridges, and the Orchestration Loop
Contents
The Dependency Graph
Maxim enforces a strict one-way dependency graph. This isn't a suggestion; the architecture prevents circular dependencies at the module level. Higher layers may call lower layers, never the reverse.
Why this matters: when memory doesn't make decisions and agents don't have side effects, you can reason about each layer independently. A bug in planning can't corrupt memory. A bad tool can't bypass safety checks. The decision engine is the single chokepoint for all actions.
The Orchestration Loop
Everything runs through selfy.py, the main conscience of the system. The name isn't accidental: it's the self-model, the inner loop where perception, cognition, and action come together.
The loop runs at whatever rate the vision system can sustain (typically 15-30 fps on the Reachy Mini's camera). Each iteration is an "epoch" in Maxim's parlance.
Threading Model
Maxim uses a carefully designed multi-threaded architecture. Each thread has a single responsibility and communicates through bounded queues:
Key Design Decision: Bounded Queues
The video save queue is bounded (blocks on backpressure), while the latest frame queue has size 1 and drops old frames. This means the observation loop always processes the freshest frame available, never falling behind. Video recording might skip frames under load, but real-time perception never stalls.
Async Worker Pool
The two biggest blocking operations in the 30Hz main loop — LLM inference and hippocampus memory writes — are now handled by three independent async systems that eliminate contention:
Passive Hippocampus
The hippocampus uses its own independent FIFO queue rather than the WorkerPool — a deliberate choice because captures don't need dependency gates and FIFO ordering is more appropriate than priority scheduling:
Three Independent Async Systems
The async architecture deliberately uses three separate systems rather than one monolithic pool:
- WorkerPool — typed lanes with dependency gates for LLM inference
- Hippocampus capture thread — own FIFO queue for memory writes
- EC NeuralEmbedder — own async queue for semantic embedding, triggered by hippocampus callbacks
This separation means LLM inference, memory capture, and semantic embedding can all proceed concurrently without blocking the 30Hz observation loop.
The Tool System
Tools are the only way Maxim affects the world. Every side effect, from moving a motor to searching the internet, goes through a tool.
Available Tools
| Category | Tool | What It Does |
|---|---|---|
| Robot Control | MoveTool | Move head to absolute pose (x, y, z, roll, pitch, yaw) |
| FocusInterestsTool | Focus on objects of interest; optionally specify a target class to prioritize | |
| TrackTargetTool | Track and follow an object ("center" or "follow" strategy) | |
| NoveltyTrackTool | Query novelty/familiarity scores for detected objects | |
| MaximCommandTool | Send state machine commands (sleep, wake, shutdown) | |
| Filesystem | ReadFileTool | Read files (path traversal blocked) |
| WriteFileTool | Write files to sandbox directory only | |
| ExecuteFileTool | Execute Python scripts with timeout enforcement | |
| GlobTool | Pattern-based file search within allowed directories | |
| BashTool | Shell command execution with containment | |
| Network | InternetSearchTool | DuckDuckGo search (returns title, URL, snippet) |
| HttpFetchTool | Fetch and parse web pages (blocks localhost) | |
| InternetAccessTool | General internet access gating | |
| Math | MathTool | Mathematical cognition — routes between IPS (compare, trend, anomaly) and Angular Gyrus (compute, analyze, matrix ops). Supports natural-language aliases (sqrt, factorial, squared, cubed) |
| Response | RespondTool | Send text response to user |
| SpeakTool | Synthesize speech via TTS engine | |
| Mode Control | ModeSwitchTool | Switch between operating modes (passive, active, singularity) |
| AutonomyLevelTool | Adjust autonomy level within the current mode | |
| Live Intent | DefineLiveModeIntentTool | Define a new intent for live mode self-evolution |
| ReviewLiveModeIntentTool | Review current live mode intent and progress | |
| RecordLiveIntentInsightTool | Record an insight relevant to the active intent | |
| RecordLiveOutcomeTool | Record an outcome observation for intent tracking | |
| Communication | SendMessageTool | Send a message through the gateway |
| CallUserTool | Initiate a call to the user via gateway |
Tool Invocation Flow
LLM Integration
Maxim runs LLMs locally using llama-cpp-python (GGUF format) for CPU + Metal GPU acceleration. No cloud API calls for inference.
Model Options
| Model | Size | Context | Best For |
|---|---|---|---|
| SmolLM 1.7B | ~1.1GB | 4096 | CPU-only, low RAM |
| Phi-3 Mini | ~2.3GB | 4096 | Balanced performance |
| Mistral 7B | ~4.4GB | 4096 | General agentic tasks |
| Qwen2 7B | ~4.4GB | 8192 | Extended context |
| Llama 3 8B | ~4.9GB | 8192 | Maximum capability |
Per-Mode Response Sizing
Each operating mode gets different context windows and response limits, tuned for its cognitive demands:
Agent Types
LLMAgent
Raw text completion. Single prompt in, text out. Used for classification and simple reasoning.
ChatLLMAgent
Multi-turn conversation with context retention. Used for interactive dialogue with users.
Hardware Abstraction
The RobotController abstract base class defines everything Maxim needs from a robot. Concrete implementations handle the actual hardware:
The ReachyMiniController connects via Zenoh peer discovery on the local network, streaming video through GStreamer and audio through WebRTC. The SimulatedController provides a virtual 640x480 stream for testing without hardware.
Multi-Robot Support
The RobotRegistry (singleton) allows connecting multiple robots simultaneously:
The Bridge System
Bridges are the connective tissue between biological memory systems and the rest of the architecture. Each bridge creates bidirectional learning between two or more components.
Why Bridges?
The strict layered architecture means memory can't directly influence decisions, and decisions can't directly write to memory. Bridges provide the controlled channels for information to flow between layers, maintaining architectural purity while enabling integrated learning.
SpatialMemoryBridge
Hippocampus + EC ↔ SpatialMap + AttentionNetwork
Stores multi-session object location priors. "The mug was on the counter 3 out of 4 times" translates to an attention boost for the counter region when searching for mugs.
PainCircuitBridge
PainDetector ↔ NAc ↔ FearAgent
Two modes of harm prevention:
| Mode | Latency | Mechanism | When |
|---|---|---|---|
| Predictive | Zero (pre-execution) | Physics-based velocity/limit analysis | Before every motor command |
| Learned | After first occurrence | NAc pattern matching from past pain events | After experiencing pain once |
PlanHistoryBridge
Hippocampus ↔ NAc
Retrieves successful plan templates from memory. If "find mug" succeeded with the tool sequence [look_at_counter, track_object, approach], that template is offered for similar future goals.
EscalationLearningBridge
Hippocampus + SCN ↔ NAc
Learns when to ask a human for help. If confidence in an action drops below a learned threshold (different per goal type, per time of day), Maxim escalates rather than acting autonomously.
EnergyCircuitBridge
Energy Tracking ↔ NAc
Reports resource costs as valence signals. Expensive LLM calls get negative valence. Cheap local inferences get positive valence. Over time, the NAc learns to predict energy costs and factor them into planning.
SalienceMemoryBridge
Hippocampus + EC ↔ SalienceNetwork
Enriches real-time salience scores with long-term interaction history. Objects you've had positive experiences with become more salient. Objects associated with failures get suppressed.
Bridge Coordination: MemoryHub
Persistence & Checkpointing
Maxim persists learned state across sessions. Every biological system saves its learned parameters:
| Component | File | What's Saved |
|---|---|---|
| Hippocampus | data/util/hippocampus.json | All episodic memories + associative indices |
| NAc | data/util/nac_state.json | Learned causal links (action → outcome) |
| SCN | data/util/scn_state.json | Temporal bin distributions |
| FocusLearner | data/util/focus_learner.json | Movement gain values (Rescorla-Wagner) |
| WorkspaceBounds | data/util/learned_bounds.json | Discovered reachable space limits |
| FearCircuit | data/util/fear_learning.json | Learned aversive action patterns |
| PainDetector | data/util/pain_detector.json | Pain threshold adaptations |
| Thresholds | data/util/adaptive_thresholds.json | Learned escalation/pain thresholds |
Goal Tree Checkpointing
Before risky operations, Maxim checkpoints its entire goal tree so it can recover if something goes wrong:
Memory Consolidation (Sleep)
| Step | Mechanism | Effect |
|---|---|---|
| 1. Score | Access frequency + recency | Rank memories by importance |
| 2. Compress | EpisodicMemory → CompressedMemory | ~2.5KB → ~200 bytes |
| 3. Prune | Remove unaccessed (>7 days) | Cap total memory footprint |
| 4. Protect | Exempt high-value memories | User interactions, successes preserved |
| 5. Reindex | SCN-aware rebalancing | Maintain temporal coverage |
Energy Tracking
Maxim tracks resource expenditure across six domains, creating a unified cost model for decision-making:
Energy signals flow to the NAc. High costs produce negative valence. Low costs produce positive valence. Over time, Maxim learns which strategies are efficient and which are wasteful, without being explicitly programmed with cost tables.
The Full Picture
Every component described here, from the layered architecture to the bridge system to energy tracking, serves a single purpose: enabling an embodied agent that learns from experience while maintaining safety guarantees.
The biological metaphors aren't decoration. They're engineering decisions. Hippocampal indexing is faster than SQL for the access patterns Maxim needs. Rescorla-Wagner learning converges more smoothly than gradient descent for small-sample motor adaptation. Pain circuits provide safety guarantees that policy-only approaches can't match.
The result is a system where you can plug in a different robot, swap the LLM, or add new tools, and the cognitive architecture adapts. Because that's what brains do.