MAXIM

Concept Decomposition

Breaking Text into Meaningful Substrate Nodes

In biological brains, perception doesn't encode entire sentences as indivisible units. When you hear "I see a blue mug on the table," your brain activates separate representations for mug, table, and blue—each independently associable with memories, emotions, and motor programs. Maxim's concept decomposition mirrors this: a Protocol-based text pre-processor that extracts noun-phrase concepts before they enter the substrate encoding pipeline.

The Problem

Without Decomposition

The entire sentence "I see a blue mug on the table" becomes one opaque substrate node. This node won't pattern-complete against a bare "mug" from a previous session. The Hebbian edge from "mug" to vision-mug is unreachable from the sentence node—silently breaking cross-modal retrieval for naturalistic inputs.

With Decomposition

"blue mug" and "table" each become their own substrate node. Each can independently bind to vision embeddings, accumulate Hebbian associations across episodes, and receive valence annotations from pain/pleasure reactions. The agent learns about things, not sentences about things.

How It Works

Pipeline Input: "I see a blue mug on the table next to the red plate" ↓ ConceptDecomposer # spaCy noun chunker (pluggable) ↓ ["blue mug", "table", "red plate"] ↓ LinguisticEncoder # embeds each chunk independently ↓ EC # pattern_complete_or_separate per chunk ↓ Hippocampus # all chunks land in same episode # Hebbian-bound together

The key property: nothing below the decomposer changes. EC, Hippocampus, Hebbian binding, cross-modal retrieval, NAc reward modulation, persistence—all stay the same. Decomposition is purely additive pre-processing.

What Gets Extracted

Input	Extracted Concepts
"I see a blue mug on the table"	["blue mug", "table"]
"The red plate is next to the green cup"	["red plate", "green cup"]
"The rusty sword feels heavy"	["rusty sword"]
"lotus"	["lotus"] (identity—no decomposition needed)

Extracted

Noun phrases—"blue mug", "red plate", "rusty sword"
Named entities—proper nouns captured by noun chunker
Single nouns—"table", "lotus", "cat"

Filtered Out

Pronouns—"he", "it", "they" (near-meaningless embeddings)
Determiners—"the", "a", "this" (function words)
Bare verbs—"see", "go" (ambiguous without objects)

Pluggable Strategies

The decomposer uses a DecompositionStrategy Protocol. Any class with an extract(text) → list[ConceptChunk] method can serve as a strategy—spaCy, LLM-based extraction, regex, or domain-specific parsers.

Custom Strategy Example class MyDomainStrategy: """Extract domain-specific concepts.""" def extract(self, text: str) → list[ConceptChunk]: # Your domain logic here return [ConceptChunk(text="concept")] decomposer = ConceptDecomposer(strategy=MyDomainStrategy())

SpaCyNounChunkStrategy

Default

Uses en_core_web_sm noun chunker. Thread-safe singleton. Strips determiners automatically.

IdentityStrategy

Fallback

Returns input unchanged as a single chunk. Active when spaCy is not installed or decomposition is disabled.

Your Strategy

Custom

Implement DecompositionStrategy Protocol. LLM-based, regex, domain-specific—plug it in.

Modality Gate

Decomposition applies to text-modality percepts only. Visual percepts (CLIP embeddings), proprioceptive readings, and SEM affordance labels bypass decomposition automatically. This mirrors how the brain processes language in Wernicke's area but routes visual input through the ventral stream without linguistic decomposition.

Enforced at the Encoder Layer

The modality gate lives inside LinguisticEncoder.encode_decomposed(). Callers don't need to check—non-text modalities are automatically handled as single nodes.

Configuration

Variable	Default	Description
MAXIM_SUBSTRATE_PATH	off	Prerequisite: enables the substrate encoding path
MAXIM_CONCEPT_DECOMPOSITION	off	Set to 1 to enable concept decomposition

# Enable both substrate path and concept decomposition MAXIM_SUBSTRATE_PATH=1 MAXIM_CONCEPT_DECOMPOSITION=1 maxim --llm mistral-7b

Connection to Valence Annotation

Concept decomposition is the prerequisite for valence annotation—the mechanism by which the substrate learns that certain concepts are associated with pain or pleasure. When an agent interacts with a SEM entity (e.g., a rusty sword) and experiences pain, decomposition ensures "rusty sword" is its own substrate node. The valence annotation system can then mark Hebbian edges connected to that node with negative valence, so future encounters with similar concepts carry the affective signal through spreading activation.

Without decomposition, the valence would land on the entire sentence blob—less precise and harder to generalize across different contexts where the same concept appears.

The ConceptChunk Data Type

ConceptChunk @dataclass(frozen=True) class ConceptChunk: text: str # "blue mug" span: tuple[int, int] | None = None # (8, 16) char offsets confidence: float = 1.0 # extraction confidence relation: str | None = None # Stage 2: "spatial"

The rich return type supports future extensions (role-tagged edges, confidence-weighted binding) without breaking the Stage 1 interface. Downstream code accesses chunk.text for encoding.

← Back to Maxim Overview View Maxim on GitHub