Table of Contents
Fetching ...

NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization

Danial Kamali, Elham J. Barezi, Parisa Kordjamshidi

TL;DR

NeSyCoCo addresses compositional generalization in vision-language reasoning by grounding symbolic predicates in distributed linguistic embeddings and executing them with differentiable, soft logic. It augments NL-to-program generation with dependency parsing, employs linguistically motivated predicate representations, and uses soft, normalized composition to align symbolic and neural reasoning, all built atop a LEFT-based foundation. The approach achieves state-of-the-art results on ReaSCAN and CLEVR-CoGenT and shows robust zero-shot generalization on CLEVR-SYN, while providing extensive ablations and analyses of predicate representations and language variety. This work advances interpretable, flexible neuro-symbolic reasoning with practical impact for robust visual reasoning across synthetic benchmarks and beyond.

Abstract

Compositional generalization is crucial for artificial intelligence agents to solve complex vision-language reasoning tasks. Neuro-symbolic approaches have demonstrated promise in capturing compositional structures, but they face critical challenges: (a) reliance on predefined predicates for symbolic representations that limit adaptability, (b) difficulty in extracting predicates from raw data, and (c) using non-differentiable operations for combining primitive concepts. To address these issues, we propose NeSyCoCo, a neuro-symbolic framework that leverages large language models (LLMs) to generate symbolic representations and map them to differentiable neural computations. NeSyCoCo introduces three innovations: (a) augmenting natural language inputs with dependency structures to enhance the alignment with symbolic representations, (b) employing distributed word representations to link diverse, linguistically motivated logical predicates to neural modules, and (c) using the soft composition of normalized predicate scores to align symbolic and differentiable reasoning. Our framework achieves state-of-the-art results on the ReaSCAN and CLEVR-CoGenT compositional generalization benchmarks and demonstrates robust performance with novel concepts in the CLEVR-SYN benchmark.

NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization

TL;DR

NeSyCoCo addresses compositional generalization in vision-language reasoning by grounding symbolic predicates in distributed linguistic embeddings and executing them with differentiable, soft logic. It augments NL-to-program generation with dependency parsing, employs linguistically motivated predicate representations, and uses soft, normalized composition to align symbolic and neural reasoning, all built atop a LEFT-based foundation. The approach achieves state-of-the-art results on ReaSCAN and CLEVR-CoGenT and shows robust zero-shot generalization on CLEVR-SYN, while providing extensive ablations and analyses of predicate representations and language variety. This work advances interpretable, flexible neuro-symbolic reasoning with practical impact for robust visual reasoning across synthetic benchmarks and beyond.

Abstract

Compositional generalization is crucial for artificial intelligence agents to solve complex vision-language reasoning tasks. Neuro-symbolic approaches have demonstrated promise in capturing compositional structures, but they face critical challenges: (a) reliance on predefined predicates for symbolic representations that limit adaptability, (b) difficulty in extracting predicates from raw data, and (c) using non-differentiable operations for combining primitive concepts. To address these issues, we propose NeSyCoCo, a neuro-symbolic framework that leverages large language models (LLMs) to generate symbolic representations and map them to differentiable neural computations. NeSyCoCo introduces three innovations: (a) augmenting natural language inputs with dependency structures to enhance the alignment with symbolic representations, (b) employing distributed word representations to link diverse, linguistically motivated logical predicates to neural modules, and (c) using the soft composition of normalized predicate scores to align symbolic and differentiable reasoning. Our framework achieves state-of-the-art results on the ReaSCAN and CLEVR-CoGenT compositional generalization benchmarks and demonstrates robust performance with novel concepts in the CLEVR-SYN benchmark.

Paper Structure

This paper contains 25 sections, 1 equation, 5 figures, 9 tables.

Figures (5)

  • Figure 1: The overall framework of NeSyCoCo. The language-to-program module generates a logical program based on the input query. Predicates, such as blue, serve as symbolic representations connected to neural modules that process representations of visual elements. These modules produce scores indicating the applicability of the concept to these elements. Differentiable soft compositional operations are then applied to the scores, executing the program and generating the answer to the query.
  • Figure 2: Language to program conversion procedure.
  • Figure 3: Differentiable predicate function in NeSyCoCo (shared FFN for all predicates) compared to LEFT (predicate-specific FFNs) calculating the score for blue.
  • Figure 4: Boxplot comparing concept scores of LEFT and NeSyCoCo on 10k CLEVR validation samples. Dotted and solid lines represent the mean and median, respectively.
  • Figure 5: Relationship between cosine similarity of word embeddings and correlation of their predicate scores.