Table of Contents
Fetching ...

Bridging Neural and Symbolic Representations with Transitional Dictionary Learning

Junyan Cheng, Peter Chin

TL;DR

The paper tackles the challenge of unifying neural and symbolic representations by introducing Transitional Dictionary Learning (TDL), which learns symbolic-like parts and relations implicitly while reconstructing inputs through a neural encoder/decoder. Core innovations include a game-theoretic diffusion–driven decomposition, online prototype clustering to create predicate dictionaries across arities, and an EM-inspired optimization that maximizes the likelihood of meaningful symbolic structures. To evaluate interpretability and compositionality, the authors propose Clustering Information Gain (CIG) and a shape score, demonstrating substantial improvements over unsupervised part-segmentation baselines on three abstract datasets, along with symbol grounding and transfer-learning capabilities. Human studies corroborate that the learned decompositions are highly interpretable and align with the proposed metrics, underscoring the practical impact of learning transitional neural-symbolic representations without supervision.

Abstract

This paper introduces a novel Transitional Dictionary Learning (TDL) framework that can implicitly learn symbolic knowledge, such as visual parts and relations, by reconstructing the input as a combination of parts with implicit relations. We propose a game-theoretic diffusion model to decompose the input into visual parts using the dictionaries learned by the Expectation Maximization (EM) algorithm, implemented as the online prototype clustering, based on the decomposition results. Additionally, two metrics, clustering information gain, and heuristic shape score are proposed to evaluate the model. Experiments are conducted on three abstract compositional visual object datasets, which require the model to utilize the compositionality of data instead of simply exploiting visual features. Then, three tasks on symbol grounding to predefined classes of parts and relations, as well as transfer learning to unseen classes, followed by a human evaluation, were carried out on these datasets. The results show that the proposed method discovers compositional patterns, which significantly outperforms the state-of-the-art unsupervised part segmentation methods that rely on visual features from pre-trained backbones. Furthermore, the proposed metrics are consistent with human evaluations.

Bridging Neural and Symbolic Representations with Transitional Dictionary Learning

TL;DR

The paper tackles the challenge of unifying neural and symbolic representations by introducing Transitional Dictionary Learning (TDL), which learns symbolic-like parts and relations implicitly while reconstructing inputs through a neural encoder/decoder. Core innovations include a game-theoretic diffusion–driven decomposition, online prototype clustering to create predicate dictionaries across arities, and an EM-inspired optimization that maximizes the likelihood of meaningful symbolic structures. To evaluate interpretability and compositionality, the authors propose Clustering Information Gain (CIG) and a shape score, demonstrating substantial improvements over unsupervised part-segmentation baselines on three abstract datasets, along with symbol grounding and transfer-learning capabilities. Human studies corroborate that the learned decompositions are highly interpretable and align with the proposed metrics, underscoring the practical impact of learning transitional neural-symbolic representations without supervision.

Abstract

This paper introduces a novel Transitional Dictionary Learning (TDL) framework that can implicitly learn symbolic knowledge, such as visual parts and relations, by reconstructing the input as a combination of parts with implicit relations. We propose a game-theoretic diffusion model to decompose the input into visual parts using the dictionaries learned by the Expectation Maximization (EM) algorithm, implemented as the online prototype clustering, based on the decomposition results. Additionally, two metrics, clustering information gain, and heuristic shape score are proposed to evaluate the model. Experiments are conducted on three abstract compositional visual object datasets, which require the model to utilize the compositionality of data instead of simply exploiting visual features. Then, three tasks on symbol grounding to predefined classes of parts and relations, as well as transfer learning to unseen classes, followed by a human evaluation, were carried out on these datasets. The results show that the proposed method discovers compositional patterns, which significantly outperforms the state-of-the-art unsupervised part segmentation methods that rely on visual features from pre-trained backbones. Furthermore, the proposed metrics are consistent with human evaluations.
Paper Structure (65 sections, 4 equations, 15 figures, 6 tables, 2 algorithms)

This paper contains 65 sections, 4 equations, 15 figures, 6 tables, 2 algorithms.

Figures (15)

  • Figure 1: Decomposing samples from three datasets into visual parts marked with different colors, shallowness represents confidence. Odd columns are input, even columns are decompositions.
  • Figure 2: Overview of our architecture. The decomposition process takes $K$ steps to iteratively refine the generated visual parts. $N_P$ models generate in parallel, each generates one part and communicates through "Broadcast". The mapped representations will be stored in a memory bank for clustering.
  • Figure 3: Examples from OmniGlot test set. Our method generates multiple interpretable strokes to reconstruct the input hand-written characters. As a comparison, the baseline methods segment the input into colored parts that are not valid strokes revealing a failure in learning compositionality.
  • Figure 4: Left: Results of the human evaluation. Right: The qualitative scores compared to metrics.
  • Figure 5: Illustration of the model architecture. Blocks with italic font are optional optimizations. Different color marks the information flow of the 5 different modules.
  • ...and 10 more figures