Bridging Neural and Symbolic Representations with Transitional Dictionary Learning
Junyan Cheng, Peter Chin
TL;DR
The paper tackles the challenge of unifying neural and symbolic representations by introducing Transitional Dictionary Learning (TDL), which learns symbolic-like parts and relations implicitly while reconstructing inputs through a neural encoder/decoder. Core innovations include a game-theoretic diffusion–driven decomposition, online prototype clustering to create predicate dictionaries across arities, and an EM-inspired optimization that maximizes the likelihood of meaningful symbolic structures. To evaluate interpretability and compositionality, the authors propose Clustering Information Gain (CIG) and a shape score, demonstrating substantial improvements over unsupervised part-segmentation baselines on three abstract datasets, along with symbol grounding and transfer-learning capabilities. Human studies corroborate that the learned decompositions are highly interpretable and align with the proposed metrics, underscoring the practical impact of learning transitional neural-symbolic representations without supervision.
Abstract
This paper introduces a novel Transitional Dictionary Learning (TDL) framework that can implicitly learn symbolic knowledge, such as visual parts and relations, by reconstructing the input as a combination of parts with implicit relations. We propose a game-theoretic diffusion model to decompose the input into visual parts using the dictionaries learned by the Expectation Maximization (EM) algorithm, implemented as the online prototype clustering, based on the decomposition results. Additionally, two metrics, clustering information gain, and heuristic shape score are proposed to evaluate the model. Experiments are conducted on three abstract compositional visual object datasets, which require the model to utilize the compositionality of data instead of simply exploiting visual features. Then, three tasks on symbol grounding to predefined classes of parts and relations, as well as transfer learning to unseen classes, followed by a human evaluation, were carried out on these datasets. The results show that the proposed method discovers compositional patterns, which significantly outperforms the state-of-the-art unsupervised part segmentation methods that rely on visual features from pre-trained backbones. Furthermore, the proposed metrics are consistent with human evaluations.
