Table of Contents
Fetching ...

Compositional Generalization Across Distributional Shifts with Sparse Tree Operations

Paul Soulos, Henry Conklin, Mattia Opper, Paul Smolensky, Jianfeng Gao, Roland Fernandez

TL;DR

The paper tackles compositional generalization under distributional shifts without relying on extensive pretraining. It introduces Sparse Coordinate Trees (SCT) to encode tree structure in vector space and unifies neural and symbolic computation within a differentiable framework, extending the Differentiable Tree Machine (DTM) to the sparse, seq2seq setting as sDTM. Key innovations include Pooling by Attention, Tree Pruning, Lexical Regularization, and seq2seq/seq2tree capabilities, enabling scalable, zero-shot lexical generalization across diverse tasks. Empirical results across Active<->Logical, FOR2LAM, GeoQuery, and SCAN demonstrate strong generalization with substantial parameter and memory reductions (around 75x) while preserving performance, highlighting the potential of neurosymbolic architectures for robust, scalable generalization.

Abstract

Neural networks continue to struggle with compositional generalization, and this issue is exacerbated by a lack of massive pre-training. One successful approach for developing neural systems which exhibit human-like compositional generalization is \textit{hybrid} neurosymbolic techniques. However, these techniques run into the core issues that plague symbolic approaches to AI: scalability and flexibility. The reason for this failure is that at their core, hybrid neurosymbolic models perform symbolic computation and relegate the scalable and flexible neural computation to parameterizing a symbolic system. We investigate a \textit{unified} neurosymbolic system where transformations in the network can be interpreted simultaneously as both symbolic and neural computation. We extend a unified neurosymbolic architecture called the Differentiable Tree Machine in two central ways. First, we significantly increase the model's efficiency through the use of sparse vector representations of symbolic structures. Second, we enable its application beyond the restricted set of tree2tree problems to the more general class of seq2seq problems. The improved model retains its prior generalization capabilities and, since there is a fully neural path through the network, avoids the pitfalls of other neurosymbolic techniques that elevate symbolic computation over neural computation.

Compositional Generalization Across Distributional Shifts with Sparse Tree Operations

TL;DR

The paper tackles compositional generalization under distributional shifts without relying on extensive pretraining. It introduces Sparse Coordinate Trees (SCT) to encode tree structure in vector space and unifies neural and symbolic computation within a differentiable framework, extending the Differentiable Tree Machine (DTM) to the sparse, seq2seq setting as sDTM. Key innovations include Pooling by Attention, Tree Pruning, Lexical Regularization, and seq2seq/seq2tree capabilities, enabling scalable, zero-shot lexical generalization across diverse tasks. Empirical results across Active<->Logical, FOR2LAM, GeoQuery, and SCAN demonstrate strong generalization with substantial parameter and memory reductions (around 75x) while preserving performance, highlighting the potential of neurosymbolic architectures for robust, scalable generalization.

Abstract

Neural networks continue to struggle with compositional generalization, and this issue is exacerbated by a lack of massive pre-training. One successful approach for developing neural systems which exhibit human-like compositional generalization is \textit{hybrid} neurosymbolic techniques. However, these techniques run into the core issues that plague symbolic approaches to AI: scalability and flexibility. The reason for this failure is that at their core, hybrid neurosymbolic models perform symbolic computation and relegate the scalable and flexible neural computation to parameterizing a symbolic system. We investigate a \textit{unified} neurosymbolic system where transformations in the network can be interpreted simultaneously as both symbolic and neural computation. We extend a unified neurosymbolic architecture called the Differentiable Tree Machine in two central ways. First, we significantly increase the model's efficiency through the use of sparse vector representations of symbolic structures. Second, we enable its application beyond the restricted set of tree2tree problems to the more general class of seq2seq problems. The improved model retains its prior generalization capabilities and, since there is a fully neural path through the network, avoids the pitfalls of other neurosymbolic techniques that elevate symbolic computation over neural computation.

Paper Structure

This paper contains 28 sections, 1 equation, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Generalization ability of our approach (sDTM) compared with baselines across various out-of-distribution shifts, averaged over different datasets. See § \ref{['sec:results']}.
  • Figure 2: An example representation using Sparse Coordinate Trees (SCT). The values are N-dimensional vectors, and the tree positional indices are integer representations of positions in the tree. The absent child nodes of "The" (indices 4 and 6) are skipped with SCT.
  • Figure 3: Left: Performing left (orange) and right (blue). Right: visualizing the left transformation which results in DP being placed at the root. Tree positional indices of $0$ and their corresponding values are discarded.
  • Figure 4: A schematic of how the three core components of the DTM (agent, interpreter, and memory) relate to each other. Adapted from Soulos_2023_DifferentiableTreeOperations.
  • Figure 5: Left: The memory state is initialized as a sequence of trees where only the root node contains a token. Right: An output sequence is embedded in a tree using the left-aligned uniform-depth (LAUD) scheme. <NT> and <EOB> are special tokens not in the original output sequence.
  • ...and 1 more figures