MAXIM

Attention & Salience

Deciding Where to Look and What Matters

Your visual system processes about 10 million bits per second. Your conscious awareness handles maybe 50. The gap is attention: a ruthless filter that decides what reaches awareness and what gets ignored. Maxim implements two complementary systems: Attention (where to look) and Salience (what matters).

The Attention Network: Spatial Focus

🧬 Biological Inspiration

The brain's attention network includes the frontal eye fields (directing gaze), the parietal cortex (spatial awareness), and the superior colliculus (rapid orienting). Together, they create a priority map of where to direct limited processing resources.

Maxim's AttentionNetwork tracks where the robot has looked using a spatial grid:

Simulated attention heat map: brighter = more attention

For each cell in the 10x10 grid (configurable), the system tracks:

Visit count: How often has this region been examined?
Success/failure history: Did looking here lead to good outcomes?
Last visit time: When was this region last attended?
Dwell time: How long was spent examining this area?

This enables exploration vs exploitation trade-offs. Unexplored regions become increasingly attractive (novelty-seeking), while regions with successful interaction history get priority when relevant.

Temporal Decay

Attention salience decays over time (default: 2 seconds). This prevents "attention lock" where the robot fixates on one spot and ensures continuous environmental scanning.

Attention Decay Over Time

0.5s

1.0s

1.5s

2.0s

The Salience Network: What Matters

🧬 Biological Inspiration

The salience network (anterior insula and ACC) determines the importance of stimuli. A snake in the grass pops out. Your name across a crowded room catches your ear. These aren't accidents; they're salience computations.

While attention answers "where", salience answers "what." The SalienceNetwork computes importance scores for detected objects based on multiple factors:

✨

Novelty (Instance + Class)

Never-before-seen objects get a boost. Instance novelty decays over 30 seconds; class-level novelty habituates as more unique instances of a category are seen.

🕐

Recency

Recently seen objects maintain salience. Decays over 5 seconds to enable re-detection.

🎯

Interest Matching

Objects matching interest classes (person, animals) get a 2x salience boost. All 80 COCO classes are always detected; interests control prioritization, not visibility.

📊

Confidence

High-confidence detections are weighted higher than uncertain ones.

💬

Interaction History

Objects with positive past interactions are boosted (via SalienceMemoryBridge).

Class-Level Novelty: Categorical Habituation

Biological Inspiration

The brain habituates to categories, not just instances. You notice your first penguin at the zoo immediately — by the twentieth, penguins barely register. But if a flamingo appears among the penguins, it pops out despite being "just another bird." This is categorical habituation: the brain tracks how familiar a type of stimulus is, separate from whether this specific instance is new. The superior temporal sulcus and inferotemporal cortex maintain category-level representations that modulate how strongly individual exemplars activate the salience network.

Maxim's NoveltyTracker implements two layers of novelty that work together:

Instance Novelty

Tracks individual objects by their tracking ID. A new object starts at maximum novelty and decays over ~30 seconds as it becomes familiar.

"Is this specific object new to me?"

Class Novelty

Tracks how many unique instances of each COCO class have been seen over the robot's lifetime. Uses logarithmic decay: novelty halves every 10 unique instances, but never drops below 30% of maximum.

"How familiar am I with this type of object?"

The effective novelty score blends both layers. A new instance of a familiar category (the 21st chair) scores lower than a new instance of a rare category (the first cup), even though both are individually new. This prevents the robot from being perpetually distracted by common objects while staying alert to genuinely novel ones.

Sensitization: The Opposite of Habituation

When connected to the SalienceMemoryBridge, the NoveltyTracker can sensitize to object classes with significant interaction histories. Both strongly positive interactions (cups that were successfully grasped) and strongly negative ones (objects that caused errors) amplify class novelty, resisting the natural habituation that count-based decay provides. Only classes with neutral or no interaction history habituate normally.

This mirrors VTA dopaminergic modulation in the brain: significant outcomes — whether rewarding or aversive — slow sensory cortex habituation for those stimulus categories. A burned child watches the stove; a rewarded child watches the cookie jar. Both resist habituation because the brain learned that these stimuli predict important outcomes.

Attention vs Salience: The Distinction

👁️ Attention Network

Focus: Where to look
Unit: Spatial regions
Tracks: Visit history, coverage
Goal: Explore efficiently
Question: "Have I checked there?"

⭐ Salience Network

Focus: What matters
Unit: Detected objects
Tracks: Importance scores
Goal: Prioritize relevance
Question: "Should I care?"

Integration: The Perception Loop

These systems work together in Maxim's perception cycle:

Perception cycle 1. CAPTURE: Camera captures frame ↓ 2. DETECT: Vision model identifies objects ↓ 3. SALIENCE: SalienceNetwork scores each detection ↓ 4. ATTENTION: AttentionNetwork suggests unexplored regions ↓ 5. GAZE: Select target (balance salience + exploration) ↓ 6. MOVE: Orient toward selected target ↓ 7. UPDATE: Record where we looked and what we saw ↓ → Repeat

Memory Integration: SalienceMemoryBridge

The SalienceMemoryBridge connects perception to long-term memory:

On detection: Query Hippocampus for past interactions with similar objects
Positive history: Object gets salience boost (we liked it before)
Negative history: Object may get suppressed (we should avoid it)
Unknown: Novelty scoring applies (might be interesting)

This creates a feedback loop: the robot learns what's worth paying attention to based on actual experience, not just hardcoded rules.

Example: Learning to Attend to Faces

Initial state: The robot treats all objects roughly equally.

Day 1: A human talks to the robot while their face is detected. Positive interaction recorded.

Day 2: When a face appears, SalienceMemoryBridge retrieves the positive history. Salience boosted.

Day 3+: The robot naturally orients toward faces faster because past experience taught it: faces = interesting interactions.

No one programmed "faces are important." The system learned it.

← Previous Math & Statistics Next → Body Awareness

All Chapters

Overview The Agent Brain Memory Systems Math & Statistics Attention & Salience Body Awareness Communication Usage Guide Technical Deep Dive Operating Modes