Table of Contents
Fetching ...

Semantic Interaction Information mediates compositional generalization in latent space

John Schwarcz

Abstract

Are there still barriers to generalization once all relevant variables are known? We address this question via a framework that casts compositional generalization as a variational inference problem over latent variables with parametric interactions. To explore this, we develop the Cognitive Gridworld, a stationary Partially Observable Markov Decision Process (POMDP) where observations are generated jointly by multiple latent variables, yet feedback is provided for only a single goal variable. This setting allows us to define Semantic Interaction Information (SII): a metric measuring the contribution of latent variable interactions to task performance. Using SII, we analyze Recurrent Neural Networks (RNNs) provided with these interactions, finding that SII explains the accuracy gap between Echo State and Fully Trained networks. Our analysis also uncovers a theoretically predicted failure mode where confidence decouples from accuracy, suggesting that utilizing interactions between relevant variables is a non-trivial capability. We then address a harder regime where the interactions must be learned by an embedding model. Learning how latent variables interact requires accurate inference, yet accurate inference depends on knowing those interactions. The Cognitive Gridworld reveals this circular dependence as a core challenge for continual meta-learning. We approach this dilemma via Representation Classification Chains (RCCs), a JEPA-style architecture that disentangles these processes: variable inference and variable embeddings are learned by separate modules through Reinforcement Learning and self-supervised learning, respectively. Lastly, we demonstrate that RCCs facilitate compositional generalization to novel combinations of relevant variables. Together, these results establish a grounded setting for evaluating goal-directed generalist agents.

Semantic Interaction Information mediates compositional generalization in latent space

Abstract

Are there still barriers to generalization once all relevant variables are known? We address this question via a framework that casts compositional generalization as a variational inference problem over latent variables with parametric interactions. To explore this, we develop the Cognitive Gridworld, a stationary Partially Observable Markov Decision Process (POMDP) where observations are generated jointly by multiple latent variables, yet feedback is provided for only a single goal variable. This setting allows us to define Semantic Interaction Information (SII): a metric measuring the contribution of latent variable interactions to task performance. Using SII, we analyze Recurrent Neural Networks (RNNs) provided with these interactions, finding that SII explains the accuracy gap between Echo State and Fully Trained networks. Our analysis also uncovers a theoretically predicted failure mode where confidence decouples from accuracy, suggesting that utilizing interactions between relevant variables is a non-trivial capability. We then address a harder regime where the interactions must be learned by an embedding model. Learning how latent variables interact requires accurate inference, yet accurate inference depends on knowing those interactions. The Cognitive Gridworld reveals this circular dependence as a core challenge for continual meta-learning. We approach this dilemma via Representation Classification Chains (RCCs), a JEPA-style architecture that disentangles these processes: variable inference and variable embeddings are learned by separate modules through Reinforcement Learning and self-supervised learning, respectively. Lastly, we demonstrate that RCCs facilitate compositional generalization to novel combinations of relevant variables. Together, these results establish a grounded setting for evaluating goal-directed generalist agents.

Paper Structure

This paper contains 45 sections, 29 equations, 14 figures, 4 algorithms.

Figures (14)

  • Figure 1: Environment schematic for $C=2$. Observations $\mathbf{o}_t$ are generated stochastically. State interactions $Z$ parameterize the likelihood $P_Z(\mathbf{o} \mid \mathbf{r})$ and state realizations $(r_1, r_2)$ fix the probability of sampling each observable $o^i$.
  • Figure 2: The cost of Naive Bayes grows with time and interactions.(a) Joint (matrices) and independent (vectors) likelihoods for example observables. (b) Top: Accuracy of Joint (left) and Naive (right) Bayes across varying context sizes. Bottom: Relative accuracy (left) and Semantic Interaction Information (right) of Joint versus Naive Bayes. Circles mark four equidistant reference time-points throughout inference.
  • Figure 3: Recurrent Neural Networks align with theoretical predictions. (a) Architecture (left) and gradient flow (right) of the Classifier. Only the goal belief-state receives a gradient. (b-d) Same as \ref{['fig:JN_comparison']}b (for $C = 1, 2$) with Fully Trained and Echo State Networks.
  • Figure 4: Failure to capture SII can induce hallucinations. (a) Sequential updating of example posteriors under Joint and Naive inference. (b) Distribution of hits and misses at each step, pooled over episodes. Misinterpreting evidence yields episodes with performance below chance.
  • Figure 5: Goals enable generalization from experience through compositional embeddings. Schematic demonstrating compositional generalization to a novel combination of states.
  • ...and 9 more figures