Usage Guide

Requirements

Python 3.12+
Hardware: Reachy Mini robot (or simulation mode for development)
RAM: 4GB minimum (8GB+ for larger LLMs)
GPU: Optional. Metal (macOS) or CUDA (Linux) supported
Network: Same LAN as Reachy Mini for Zenoh discovery

Blackwell GPU Note

RTX 50-series (Blackwell) GPUs have a known GStreamer/CUDA incompatibility. Maxim auto-detects this and falls back to CPU mode. You can force CPU mode manually with CUDA_VISIBLE_DEVICES="".

Installation

Clone the repository
git clone https://github.com/dennys246/Maxim.git cd Maxim
Create a virtual environment
python -m venv maxim-env source maxim-env/bin/activate
Install core package
pip install -e .
Install LLM support (enables the agentic runtime)
pip install -e '.[llm]'
Download a language model
./scripts/download_models.sh --llm --enable

This downloads SmolLM 1.7B (~1.1GB), the default CPU-friendly model.

Optional Extras

Extra	Install Command	What It Enables
TTS	pip install -e '.[tts]'	Text-to-speech (Piper TTS)
Semantic	pip install -e '.[semantic]'	Neural similarity (SentenceTransformer)
Torch LLM	pip install -e '.[llm-torch]'	PyTorch transformers backend

Configuration

Configuration Files

All configuration lives in data/util/:

File	Purpose
llm.json	LLM model selection, quantization, per-mode response sizing
robots.yaml	Robot connection settings (type, name, timeout)
phrase_responses.json	Voice command mappings ("Maxim sleep" → sleep mode)
key_responses.json	Keyboard shortcut bindings
whisper.json	Audio transcription settings (model size, compute type)

Environment Variables

Core settings MAXIM_LLM_ENABLED=1 # Enable LLM inference MAXIM_LLM_PROFILE=smollm-1.7b-instruct # Model profile MAXIM_LLM_QUANTIZATION=Q4_K_M # Quantization level MAXIM_PROMPT_PROFILE=standard # Prompt optimization tier MAXIM_ROBOT_NAME=reachy_mini # Robot identifier GPU control CUDA_VISIBLE_DEVICES="" # Force CPU-only MAXIM_LLM_N_GPU_LAYERS=0 # CPU inference MAXIM_LLM_N_GPU_LAYERS=-1 # All layers on GPU Display and audio MAXIM_DISABLE_IMSHOW=1 # Disable OpenCV windows MAXIM_WHISPER_COMPUTE_TYPE=float32 # Whisper precision fallback

Prompt Profiles

Three tiers of cognitive effort, matched to your hardware:

Profile	Max Depth	LLM Calls	Parallelism	Best For
minimal	2 levels	8 max	None	CPU-only, low RAM, Raspberry Pi
standard	5 levels	20 max	4 workers	Laptop with GPU or fast CPU
rich	7 levels	50 max	8 workers	Desktop with dedicated GPU

Operating Modes

Maxim uses a decomposed mode architecture combining processing states (awake/sleep), operational modes (passive/active/singularity), and strategies (observe/explore/research/assist/reflect/learn). For backward compatibility, the CLI accepts legacy mode names that map to specific combinations. See the Operating Modes deep dive for full details.

🔭

Exploration

Active + explore strategy. Curiosity-driven discovery with budget tracking and novelty gating.

maxim

📡

Live

Active + assist strategy. Full embodied runtime with self-evolving behavioral intent.

maxim --mode live

😴

Sleep

Sleep processing state. Audio monitoring, memory consolidation, wake keyword detection.

maxim --mode sleep

💭

Reflection

Passive + reflect strategy. Internal analysis, memory review, pattern identification.

maxim --mode reflection

🏋️

Train

Passive + learn strategy. Watch demonstrations, incorporate feedback, update behavior.

maxim --mode train

👀

Observe

Passive + observe strategy. Watch and understand without interfering.

maxim --mode observe

Operational Modes & Autonomy

Operational modes map directly to autonomy levels:

Mode	Autonomy	Behavior	Max Initiative
passive	planning	Proposes actions, waits for human approval	0.3
active	supervised	Acts within defined boundaries, escalates on uncertainty	0.7
singularity	autonomous	Full agency with self-correction (safety constraints still apply)	1.0

CLI Reference

Basic usage maxim [OPTIONS] Connection --robot-name TEXT Robot identifier (default: reachy_mini) --home-dir PATH Data directory (default: data) --timeout FLOAT Connection timeout in seconds (default: 30.0) Execution --mode MODE exploration|live|observe|sleep|reflection|train|research --epochs INT Stop after N cycles (0 = unlimited) --audio true|false Enable audio recording (default: true) --audio_len FLOAT Transcription chunk duration (default: 5.0s) --interactive true|false Enable keyboard input (default: true) Agentic mode --language-model TEXT LLM profile (e.g., mistral-7b) --prompt-profile TIER minimal|standard|rich --autonomy LEVEL planning|supervised|autonomous --autonomy-duration FLOAT Timed autonomy in seconds --memory-path PATH Memory persistence file --reset Clear memory on startup --enable-embeddings Enable semantic similarity (Phase 4) Network --internet-access Enable internet (default) --no-internet Disable internet access Audio / TTS --tts Enable text-to-speech --tts-model TEXT Voice model (default: en_US-lessac-medium) Exploration --explore [FOCUS] Start exploration with optional focus topic --exploration-duration FLOAT Session duration in seconds --exploration-autonomy LEVEL supervised|autonomous --exploration-allow-scripts Allow Python script generation --exploration-allow-training Allow model training during exploration --resume-session ID Resume a previous session --list-sessions List available sessions Maintenance --clear-cache Remove __pycache__ directories --clear-memory [TYPE] Clear persistent memory and exit --verbosity 0|1|2 Logging level --agentic-verbosity 0|1|2|3 Agentic system logging detail --no-agentic-console Suppress agentic event output

Voice Commands

All voice commands begin with the wake word "Maxim":

Command	Effect
"Maxim sleep"	Enter sleep mode (audio monitoring only)
"Maxim wake up"	Return to previous active mode
"Maxim observe"	Switch to observe strategy
"Maxim explore"	Switch to explore strategy
"Maxim assist"	Switch to assist strategy
"Maxim reflect"	Switch to reflect strategy
"Maxim passive" / "active" / "singularity"	Switch operational mode
"Maxim shutdown"	Clean shutdown

See the Operating Modes page for the full list. Custom voice commands can be added in data/util/phrase_responses.json.

Keyboard Controls

Available in interactive mode (default):

c Center vision

u Mark trainable

0 Label: no errors

1-9 Label: error code

q Quit

Common Recipes

First Run (No Robot)

Test the system without hardware using simulation:

maxim --mode exploration

Full Agentic Mode with Mistral

maxim --mode live \ --language-model mistral-7b \ --prompt-profile standard \ --autonomy supervised

CPU-Only with Small Model

CUDA_VISIBLE_DEVICES="" maxim \ --mode live \ --language-model smollm-1.7b \ --prompt-profile minimal

Timed Autonomous Exploration

maxim --explore "kitchen objects" \ --exploration-duration 300 \ --exploration-autonomy autonomous

Verbose Debugging Session

maxim --mode live \ --verbosity 2 \ --agentic-verbosity 3 \ --language-model phi-3-mini

Resume a Previous Session

# List available sessions maxim --list-sessions # Resume by ID maxim --resume-session abc123

Memory Management

Clearing Memory

Clear everything maxim --clear-memory all Clear specific types maxim --clear-memory focus,bounds # Movement learning only maxim --clear-memory nac,hippo # Decision learning + episodes maxim --clear-memory pain,fear # Safety learning only

Available Memory Types

Type	What It Clears	Effect
focus	FocusLearner gains	Resets movement calibration
bounds	Workspace limits	Relearns reachable space
nac	Causal links	Forgets action-outcome predictions
scn	Temporal patterns	Forgets time-of-day associations
hippo	Episodic memories	Complete amnesia
pain	Pain thresholds	Resets pain sensitivity
fear	Fear associations	Forgets learned dangers
escalation	Escalation thresholds	Resets when to ask for help
threshold	Adaptive thresholds	Resets all learned limits
semantic	Neural embeddings	Resets similarity cache

Where Data Lives

Directory structure data/ ├── util/ # Learned state (JSON) │ ├── hippocampus.json │ ├── nac_state.json │ ├── scn_state.json │ ├── focus_learner.json │ ├── learned_bounds.json │ ├── fear_learning.json │ └── ... ├── videos/ # Recorded video (MP4) ├── audio/ # Recorded audio (WAV) ├── transcript/ # Transcriptions (JSONL) ├── training/ # Motor training samples (JSONL) ├── plans/checkpoints/ # Goal tree snapshots ├── models/ # ML models │ ├── LLM/ # GGUF language models │ ├── MotorCortex/ # Vision-to-motor model │ └── YOLO/ # Object detection ├── sandbox/ # Safe space for generated files └── logs/ # Session logs

Debugging

Verbosity Levels

Level	What You See
0	Errors only
1	Key events (mode changes, goals, tool calls)
2	Detailed processing (every detection, memory query, decision)
3 (agentic only)	Full trace (LLM prompts, raw responses, bridge activity)

Inspecting Learned State

Python: inspect memory from maxim.memory.hippocampus import Hippocampus hippo = Hippocampus.load_from_file("data/util/hippocampus.json") print(f"Total memories: {len(hippo.memories)}") for mem_id, mem in list(hippo.memories.items())[:5]: print(f" {mem.action.tool_name} → {mem.outcome.valence}") Python: inspect NAc learning from maxim.decisions.nac import NAc nac = NAc.load_from_file("data/util/nac_state.json") for sig, pred in nac._learned_associations.items(): print(f" {sig}: conf={pred.confidence:.2f}, val={pred.valence}")

Hardware Diagnostics

maxim-diagnostics --host 192.168.1.100

Checks Zenoh connection, motor controller, video stream, and audio stream availability.

Log Files

Session logs are written to data/logs/reachy_log_YYYY-MM-DD_HHMMSS.log with timestamps, thread IDs, and structured event data.

Getting Help

Maxim is open source. If you run into issues, check the GitHub repository for the latest documentation and issue tracker.

Contents