MAXIM

Experiments & Results

Deterministic Validation of the Bio-Inspired Learning Pipeline

41/41 hypotheses confirmed across 3 testing tiers, plus 3 additional validation experiments (B4 replanning, P6 extinction, P8 sleep replay) shipped in v0.5.0. Tier 1 experiments run on the substrate layer alone (deterministic, no LLM). Tier 2 uses scripted training with real LLM decisions. Tier 3 is the ultimate proof: fully organic LLM-driven training and testing with no scripted reactions. B4 replanning closes the last 1.0 gate besides embodiment. Each hypothesis is stated falsifiably, each result is a pass/fail count, and each experiment includes a reproduction command.

Three Testing Tiers

Tier 1 (deterministic, no LLM): isolates the bio-pipeline's learning signal from LLM variance. Tier 2 (scripted training, LLM test): proves the LLM acts on bio-system learning with masked entity names to prevent language priors. Tier 3 (organic LLM training + test): the ultimate proof — the agent learns from its own actions with no scripted reactions, and a fresh control agent fails the same scenario.

Experiment 4: Organic LLM Learning (Tier 3)

5/5 PASS Tier 3: Organic Learning 2026-04-17

The ultimate proof of the 1.0 claim. An agent running in a real sim learns from its own actions — no scripted training, no injected reactions. The agent chooses a vial, experiences the outcome through CerebellumModulator → ReactionBus → valence annotation, and makes different choices in subsequent sessions. A fresh control agent with no prior experience dies.

Results: Teal (Antidote) Selection Rate Across Sessions

Session	Teal Rate	Interpretation
Session 1 (exploration)	0%	No prior knowledge — agent explores randomly
Session 2 (early learning)	25%	Agent begins shifting toward learned associations
Session 3 (convergence)	100%	Full convergence — agent picks antidote every time
Fresh control	DIED	No learning signal — agent never picks antidote, dies from poison

The experienced agent escapes on turn 1 in Session 3. The fresh agent dies. Cross-session learning without fine-tuning, demonstrated with fully organic LLM-driven training.

Why This Matters

Tier 1 and 2 experiments proved the substrate learns and the LLM acts on it. But training was scripted — reactions were injected. Tier 3 closes the loop: the agent takes actions, experiences outcomes through CerebellumModulator, builds bio-system state organically, and uses that state to make better decisions in future sessions. No fine-tuning. No gradient updates. Just a bio-inspired memory architecture that the LLM reads at inference time.

Reproduce

PYTHONPATH=src python scripts/behavioral_convergence_exp4_tier3.py --model qwen2.5-14b

Detailed writeup: behavioral_convergence_practice.md

Experiment 3: LLM Acts on Bio-System Learning (Tier 2)

12/12 PASS Tier 2: LLM Decisions 2026-04-17

An LLM given valence context from the bio-system makes different tool-selection decisions than a fresh LLM. Three masked vials with arbitrary names (no semantic hints) ensure the LLM cannot use language priors. Scripted deterministic training, then real LLM test decisions. N=10 per condition.

Results: Vial Selection (N=10 per condition)

Vial	Experienced	Fresh	Effect
Teal Cylindrical Ceramic (antidote)	10/10 (100%)	0/10 (0%)	Perfect discrimination
Purple Hexagonal Glass (heals HP)	0/10	7/10	Fresh prefers purple (no poison knowledge)
Orange Triangular Crystal (more poison)	0/10	3/10	Fresh picks harmful vial 30% of the time

Valence strength differentiation is critical — flat “GOOD/BAD” labels showed no effect. The “VERY GOOD” vs “good” distinction drives discrimination. Model: qwen2.5-14b, temperature 0.3.

Reproduce

PYTHONPATH=src python scripts/behavioral_convergence_exp3_tier2.py --model qwen2.5-14b

Detailed writeup: behavioral_convergence_practice.md

Experiment 2: Energy-Driven Consumable Learning

13/13 PASS 2026-04-17

The agent interacts with three consumable SEM entities — food ration, water flask, and poison vial — while its energy depletes over time. Energy depletion fires interoceptive Reactions (hunger, fatigue) through the energy→Reaction bridge. Consuming food and water restores energy and triggers satiation reactions (positive valence). Consuming poison causes pain (negative valence). The substrate learns to differentiate beneficial from harmful consumables.

Results

Entity	Valence	Interpretation
Food ration	+0.753	Strongly positive — reliably restores energy
Water flask	+0.135	Mildly positive — restores energy but also environmental satiation dilutes signal
Poison vial	-0.495	Strongly negative — causes pain

Energy bridge events: 1 hunger, 1 fatigue, 3 satiation. Environmental satiation creates background positive credit; the discriminant is relative bias strength.

Reproduce

PYTHONPATH=src python scripts/behavioral_convergence_exp2.py

Detailed writeup: behavioral_convergence_practice.md

Experiment 1: Cross-Session Affective Memory

11/11 PASS 2026-04-17

The agent interacts with three SEM entities in Session 1 — a rusty sword (causes pain on use), a healing potion (positive outcome), and a poison potion (disguised harm). Session state is persisted. In Session 2, we measure whether affective associations transferred: does the substrate carry negative valence for the sword, positive for healing, and negative for poison?

Results

Entity	Experienced Agent	Key Signal
Rusty sword	-0.800	Strong negative valence from pain
Healing potion	+0.195	Positive valence + NAc reward bias + EC widened
Poison potion	-0.574	Negative valence despite “potion” label

Shared “potion” concept carries mixed valence (healing + poison). Reward bias is asymmetric: positive only widens EC recognition, never narrows. Pain spikes create clean episode boundaries.

Reproduce

PYTHONPATH=src python scripts/behavioral_convergence_exp1.py

Detailed writeup: behavioral_convergence_practice.md

Valence Annotation PoC

Verified 2026-04-17 — 26 unit tests

Validates the three-stage valence annotation pipeline: (1) Reactions captured during an episode set Episode.valence as the mean reaction valence; (2) Hebbian edges inherit valence via Edge.metadata["valence"] at episode close; (3) spreading_activation(propagate_valence=True) propagates affective signal through multi-hop associations. Control condition: episodes without reactions have neutral valence (0.0), and propagate_valence=False returns plain activation values.

Stage 1: Episode Valence

Pain reactions set negative valence on the episode. Success reactions set positive. Mean of all reactions during the episode's lifetime.

Stage 2: Edge Valence

apply_hebbian_on_close annotates Hebbian edges with metadata["valence"]. Associative connections carry emotional coloring.

Stage 3: Propagation

Spreading activation carries valence through the graph. retrieve_on_cue(include_valence=True) returns affective context for the LLM prompt.

Detailed plan: substrate_valence_annotation.md

SEM Learning Loop PoC

Full Pipeline Verified 2026-04-17 — 5 stages

The complete SEM learning loop wires five previously disconnected components into a single signal flow: CerebellumModulator executes affordances and emits typed Reactions (success or failure) → ReactionBus dispatches to hippocampus (capture_reaction for episode valence) and NAc (distribute_reward for per-node reward bias + EC threshold adjustment) → pain spikes close episode boundaries via salience_spike_rule → future retrieval carries affective memory via spreading_activation(propagate_valence=True).

Signal Flow CerebellumModulator.execute() | |-- failure --> Reaction(NEGATIVE, intensity=0.3-0.5) |-- success --> Reaction(POSITIVE, intensity=0.1-0.3) # negativity bias | v ReactionBus.dispatch() | |-- hippocampus.capture_reaction() --> Episode.valence |-- nac.distribute_reward() --> reward_bias + EC threshold | v Episode close: apply_hebbian_on_close() --> Edge.metadata["valence"] Pain spike: salience_spike_rule() --> episode boundary | v Retrieval: spreading_activation(propagate_valence=True)

Stage 1: Cerebellum Activation

BioStack.cerebellum constructed by build_bio_stack, forwarded via build_executor(cerebellum=...). Every SEM affordance tool gets a live Cerebellum backing.

Stage 2: distribute_reward Wiring

ReactionBus subscriber calls nac.distribute_reward on every Reaction. Positive rewards widen EC recognition (lower threshold); negative clamp to 0.

Stage 3: Success Reactions

CerebellumModulator emits POSITIVE reactions when confident enough to skip LLM fallback. Lower intensity (0.1-0.3 vs 0.3-0.5) — biologically motivated negativity bias.

Stage 4: Pain Spike Boundaries

salience_spike_rule(min_intensity=0.5) closes the current episode on high-intensity pain, capturing negative valence and starting fresh. Mirrors biological trauma creating sharp memory boundaries.

Detailed plan: sem_learning_loop.md

v0.5.0 Experiments (2026-04-19)

Three new experiment results shipped in v0.5.0. B4 replanning closes the last 1.0 gate besides embodiment. P6 and P8 validate memory lifecycle mechanisms.

B4 Replanning — Blind A/B Validation (1.0 GATE CLOSED)

5 seeded failure scenarios. Treatment (B4 replanning with prior-attempt retrieval) vs control (no replanning, retries same approach). Treatment: 100% recovery (5/5). Control: 0% (0/5). Mean Jaccard distance 0.894 (minimum 0.600, threshold 0.3). Structural quality judge passes all 10 alternative plans. 12 tests.

Report: b4_replanning_results.md

P6 Extinction — Hebbian Decay vs LRU (2026-04-19)

Multiplicative Hebbian decay (DependencyGraph.decay_edges()) vs LRU baseline. Two-group fixture: Group A (reinforced) stays above 80%, Group B (unreinforced) drops below 20% after 30 ticks at factor 0.85. Hebbian decay beats LRU across all 10 seeds.

Report: p6_extinction_results.md

P8 Sleep Replay — Offline Consolidation (2026-04-19)

Episode ranking by NAc reward_bias + valence. Replay re-fires apply_hebbian_on_close with 1.5× consolidation multiplier. F1 score improves on replayed probes vs no-replay control across all 10 seeds. Activates memory_consolidation_practice.md living doc.

Report: p8_sleep_replay_results.md

Earlier Substrate Results

The experiments above build on a foundation of substrate validation work. Key earlier results:

P2 Reward Modulation Sweep (2026-04-14)

Real-embedding sweep at [email protected], reward 2.0: +56.0 ± 29.0 pp target gain, 0.0 ± 0.0 pp distractor drift, 94% monotone, 9-of-10 seeds. NAc per-node reward bias modulates EC recognition thresholds correctly.

P4 Cross-Modal Binding (2026-04-16)

Head-to-head: Arm B (Hebbian) F1=1.000 vs Arm C (OpenCLIP) F1=0.901, delta +0.099. The substrate's Hebbian binding outperforms the neural baseline on cross-modal retrieval.

Concept Decomposition Validation (2026-04-17)

100% concept-level recall vs 36.4% baseline (+63.6 pp). Noun-phrase extraction before EC encoding enables finer-grained Hebbian binding and cross-modal retrieval.

P0 Baseline Pilot (2026-04-12)

Baseline pinned at 78.5% collapse. 55 clusters, 155 sentences, 3 difficulty tiers. Fixtures calibrated in the 60-85% range. Foundation for all subsequent substrate phases.

Full experiment reports: docs/experiments/

← Back to Maxim Overview View Maxim on GitHub

Three Testing Tiers

Experiment 4: Organic LLM Learning (Tier 3)

Results: Teal (Antidote) Selection Rate Across Sessions

Why This Matters

Experiment 3: LLM Acts on Bio-System Learning (Tier 2)

Results: Vial Selection (N=10 per condition)

Experiment 2: Energy-Driven Consumable Learning

Results

Experiment 1: Cross-Session Affective Memory

Results

Valence Annotation PoC

Stage 1: Episode Valence

Stage 2: Edge Valence

Stage 3: Propagation

SEM Learning Loop PoC

Stage 1: Cerebellum Activation

Stage 2: distribute_reward Wiring

Stage 3: Success Reactions

Stage 4: Pain Spike Boundaries

v0.5.0 Experiments (2026-04-19)

B4 Replanning — Blind A/B Validation (1.0 GATE CLOSED)

P6 Extinction — Hebbian Decay vs LRU (2026-04-19)

P8 Sleep Replay — Offline Consolidation (2026-04-19)

Earlier Substrate Results

P2 Reward Modulation Sweep (2026-04-14)

P4 Cross-Modal Binding (2026-04-16)

Concept Decomposition Validation (2026-04-17)

P0 Baseline Pilot (2026-04-12)

All Chapters