MAXIM
Concept Decomposition
Breaking Text into Meaningful Substrate Nodes
In biological brains, perception doesn't encode entire sentences as indivisible units. When you hear "I see a blue mug on the table," your brain activates separate representations for mug, table, and blue—each independently associable with memories, emotions, and motor programs. Maxim's concept decomposition mirrors this: a Protocol-based text pre-processor that extracts noun-phrase concepts before they enter the substrate encoding pipeline.
The Problem
Without Decomposition
The entire sentence "I see a blue mug on the table" becomes one opaque substrate node. This node won't pattern-complete against a bare "mug" from a previous session. The Hebbian edge from "mug" to vision-mug is unreachable from the sentence node—silently breaking cross-modal retrieval for naturalistic inputs.
With Decomposition
"blue mug" and "table" each become their own substrate node. Each can independently bind to vision embeddings, accumulate Hebbian associations across episodes, and receive valence annotations from pain/pleasure reactions. The agent learns about things, not sentences about things.
How It Works
The key property: nothing below the decomposer changes. EC, Hippocampus, Hebbian binding, cross-modal retrieval, NAc reward modulation, persistence—all stay the same. Decomposition is purely additive pre-processing.
What Gets Extracted
| Input | Extracted Concepts |
|---|---|
| "I see a blue mug on the table" | ["blue mug", "table"] |
| "The red plate is next to the green cup" | ["red plate", "green cup"] |
| "The rusty sword feels heavy" | ["rusty sword"] |
| "lotus" | ["lotus"] (identity—no decomposition needed) |
Extracted
- Noun phrases—"blue mug", "red plate", "rusty sword"
- Named entities—proper nouns captured by noun chunker
- Single nouns—"table", "lotus", "cat"
Filtered Out
- Pronouns—"he", "it", "they" (near-meaningless embeddings)
- Determiners—"the", "a", "this" (function words)
- Bare verbs—"see", "go" (ambiguous without objects)
Pluggable Strategies
The decomposer uses a DecompositionStrategy Protocol. Any class with an extract(text) → list[ConceptChunk] method can serve as a strategy—spaCy, LLM-based extraction, regex, or domain-specific parsers.
SpaCyNounChunkStrategy
Default
Uses en_core_web_sm noun chunker. Thread-safe singleton. Strips determiners automatically.
IdentityStrategy
Fallback
Returns input unchanged as a single chunk. Active when spaCy is not installed or decomposition is disabled.
Your Strategy
Custom
Implement DecompositionStrategy Protocol. LLM-based, regex, domain-specific—plug it in.
Modality Gate
Decomposition applies to text-modality percepts only. Visual percepts (CLIP embeddings), proprioceptive readings, and SEM affordance labels bypass decomposition automatically. This mirrors how the brain processes language in Wernicke's area but routes visual input through the ventral stream without linguistic decomposition.
Enforced at the Encoder Layer
The modality gate lives inside LinguisticEncoder.encode_decomposed(). Callers don't need to check—non-text modalities are automatically handled as single nodes.
Configuration
| Variable | Default | Description |
|---|---|---|
| MAXIM_SUBSTRATE_PATH | off | Prerequisite: enables the substrate encoding path |
| MAXIM_CONCEPT_DECOMPOSITION | off | Set to 1 to enable concept decomposition |
Connection to Valence Annotation
Concept decomposition is the prerequisite for valence annotation—the mechanism by which the substrate learns that certain concepts are associated with pain or pleasure. When an agent interacts with a SEM entity (e.g., a rusty sword) and experiences pain, decomposition ensures "rusty sword" is its own substrate node. The valence annotation system can then mark Hebbian edges connected to that node with negative valence, so future encounters with similar concepts carry the affective signal through spreading activation.
Without decomposition, the valence would land on the entire sentence blob—less precise and harder to generalize across different contexts where the same concept appears.
The ConceptChunk Data Type
The rich return type supports future extensions (role-tagged edges, confidence-weighted binding) without breaking the Stage 1 interface. Downstream code accesses chunk.text for encoding.