Table of Contents
Fetching ...

Fully Distributed, Flexible Compositional Visual Representations via Soft Tensor Products

Bethia Sun, Maurice Pagnucco, Yang Song

TL;DR

This work tackles the challenge of learning distributed, compositional representations in vision by extending Smolensky's Tensor Product Representations to a continuous, flexible form called Soft TPR. A dedicated Soft TPR Autoencoder learns these representations by enforcing a closeness constraint to an explicit TPR while using weak supervision and a TPR decoder to preserve semantic structure. Empirically, Soft TPR yields state-of-the-art disentanglement, faster convergence, and superior sample efficiency on downstream tasks, outperforming symbolic slot-based baselines and traditional TPR methods. The findings suggest that releasing the strict algebraic constraints of classical TPR in favor of a relaxed, distributed form can better align with deep learning's distributed and gradient-driven learning dynamics, with potential extensions to hierarchical compositionality.

Abstract

Since the inception of the classicalist vs. connectionist debate, it has been argued that the ability to systematically combine symbol-like entities into compositional representations is crucial for human intelligence. In connectionist systems, the field of disentanglement has gained prominence for its ability to produce explicitly compositional representations; however, it relies on a fundamentally symbolic, concatenative representation of compositional structure that clashes with the continuous, distributed foundations of deep learning. To resolve this tension, we extend Smolensky's Tensor Product Representation (TPR) and introduce Soft TPR, a representational form that encodes compositional structure in an inherently distributed, flexible manner, along with Soft TPR Autoencoder, a theoretically-principled architecture designed specifically to learn Soft TPRs. Comprehensive evaluations in the visual representation learning domain demonstrate that the Soft TPR framework consistently outperforms conventional disentanglement alternatives -- achieving state-of-the-art disentanglement, boosting representation learner convergence, and delivering superior sample efficiency and low-sample regime performance in downstream tasks. These findings highlight the promise of a distributed and flexible approach to representing compositional structure by potentially enhancing alignment with the core principles of deep learning over the conventional symbolic approach.

Fully Distributed, Flexible Compositional Visual Representations via Soft Tensor Products

TL;DR

This work tackles the challenge of learning distributed, compositional representations in vision by extending Smolensky's Tensor Product Representations to a continuous, flexible form called Soft TPR. A dedicated Soft TPR Autoencoder learns these representations by enforcing a closeness constraint to an explicit TPR while using weak supervision and a TPR decoder to preserve semantic structure. Empirically, Soft TPR yields state-of-the-art disentanglement, faster convergence, and superior sample efficiency on downstream tasks, outperforming symbolic slot-based baselines and traditional TPR methods. The findings suggest that releasing the strict algebraic constraints of classical TPR in favor of a relaxed, distributed form can better align with deep learning's distributed and gradient-driven learning dynamics, with potential extensions to hierarchical compositionality.

Abstract

Since the inception of the classicalist vs. connectionist debate, it has been argued that the ability to systematically combine symbol-like entities into compositional representations is crucial for human intelligence. In connectionist systems, the field of disentanglement has gained prominence for its ability to produce explicitly compositional representations; however, it relies on a fundamentally symbolic, concatenative representation of compositional structure that clashes with the continuous, distributed foundations of deep learning. To resolve this tension, we extend Smolensky's Tensor Product Representation (TPR) and introduce Soft TPR, a representational form that encodes compositional structure in an inherently distributed, flexible manner, along with Soft TPR Autoencoder, a theoretically-principled architecture designed specifically to learn Soft TPRs. Comprehensive evaluations in the visual representation learning domain demonstrate that the Soft TPR framework consistently outperforms conventional disentanglement alternatives -- achieving state-of-the-art disentanglement, boosting representation learner convergence, and delivering superior sample efficiency and low-sample regime performance in downstream tasks. These findings highlight the promise of a distributed and flexible approach to representing compositional structure by potentially enhancing alignment with the core principles of deep learning over the conventional symbolic approach.

Paper Structure

This paper contains 57 sections, 38 equations, 36 figures, 48 tables.

Figures (36)

  • Figure 1: (a) Disentangled representations can be conceptualised as a concatenation of FoV tokens (coloured blocks), enforcing a symbolic, string-like compositional structure, where each FoV is allocated to a discrete slot in the representation. We instead, consider a distributed representation of compositional structure, (b), where information from densely encoded FoV (first 6 waves) are continuously combined together to form the representation, $\psi(x)$ (in red), effectively distributing the information from multiple FoVs into a single dimension of $\psi(x)$. (c) Only a subset of points (stars) in the underlying representational space (rainbow manifold) satisfy the TPR specification. The Soft TPR relaxes this, capturing larger, continuous regions of the underlying representational space (the translucent circles), while approximately preserving the TPR's key properties.
  • Figure 2: Diagram illustrating the Soft TPR Autoencoder. We encourage the encoder $E$'s output, $z$, to have the form of a Soft TPR by penalising its distance with the greedily defined, explicit TPR, $\psi_{tpr}^{*}$ of Equation \ref{['eq:5']} that $z$ best approximates. $\psi_{tpr}^{*}$ is recovered using a 3 step process performed by our TPR decoder (center rectangle): 1) unbinding, 2) quantisation, and 3) TPR construction. The decoder, $D$, reconstructs the input image using $\psi_{tpr}^{*}$.
  • Figure 3: Factor score convergence on the Cars3D dataset
  • Figure 5: BetaVAE score convergence on the Cars3D dataset
  • Figure 7: Factor score convergence on the Shapes3D dataset
  • ...and 31 more figures

Theorems & Definitions (5)

  • proof
  • proof
  • proof
  • proof
  • proof