Table of Contents
Fetching ...

The Role of Foundation Models in Neuro-Symbolic Learning and Reasoning

Daniel Cunnington, Mark Law, Jorge Lobo, Alessandra Russo

TL;DR

The paper presents NeSyGPT, a hybrid architecture that leverages a vision-language foundation model to extract symbolic features from raw data and an ASP-based symbolic learner to perform robust reasoning on downstream tasks. By fine-tuning BLIP and learning interpretable ASP rules, NeSyGPT reduces labeling requirements and scales to complex neuro-symbolic problems, while enabling a modular neural-symbolic interface that can be guided by large language models. Across MNIST Arithmetic, Follow Suit, Plant Hitting Sets, and CLEVR-Hans, NeSyGPT achieves superior accuracy, demonstrates strong data efficiency, and facilitates rule discovery in interpretable form. The work highlights practical benefits for safe AI deployment and suggests future directions for further reducing manual engineering via LLMs and expanding the symbolic reasoning capabilities.

Abstract

Neuro-Symbolic AI (NeSy) holds promise to ensure the safe deployment of AI systems, as interpretable symbolic techniques provide formal behaviour guarantees. The challenge is how to effectively integrate neural and symbolic computation, to enable learning and reasoning from raw data. Existing pipelines that train the neural and symbolic components sequentially require extensive labelling, whereas end-to-end approaches are limited in terms of scalability, due to the combinatorial explosion in the symbol grounding problem. In this paper, we leverage the implicit knowledge within foundation models to enhance the performance in NeSy tasks, whilst reducing the amount of data labelling and manual engineering. We introduce a new architecture, called NeSyGPT, which fine-tunes a vision-language foundation model to extract symbolic features from raw data, before learning a highly expressive answer set program to solve a downstream task. Our comprehensive evaluation demonstrates that NeSyGPT has superior accuracy over various baselines, and can scale to complex NeSy tasks. Finally, we highlight the effective use of a large language model to generate the programmatic interface between the neural and symbolic components, significantly reducing the amount of manual engineering required.

The Role of Foundation Models in Neuro-Symbolic Learning and Reasoning

TL;DR

The paper presents NeSyGPT, a hybrid architecture that leverages a vision-language foundation model to extract symbolic features from raw data and an ASP-based symbolic learner to perform robust reasoning on downstream tasks. By fine-tuning BLIP and learning interpretable ASP rules, NeSyGPT reduces labeling requirements and scales to complex neuro-symbolic problems, while enabling a modular neural-symbolic interface that can be guided by large language models. Across MNIST Arithmetic, Follow Suit, Plant Hitting Sets, and CLEVR-Hans, NeSyGPT achieves superior accuracy, demonstrates strong data efficiency, and facilitates rule discovery in interpretable form. The work highlights practical benefits for safe AI deployment and suggests future directions for further reducing manual engineering via LLMs and expanding the symbolic reasoning capabilities.

Abstract

Neuro-Symbolic AI (NeSy) holds promise to ensure the safe deployment of AI systems, as interpretable symbolic techniques provide formal behaviour guarantees. The challenge is how to effectively integrate neural and symbolic computation, to enable learning and reasoning from raw data. Existing pipelines that train the neural and symbolic components sequentially require extensive labelling, whereas end-to-end approaches are limited in terms of scalability, due to the combinatorial explosion in the symbol grounding problem. In this paper, we leverage the implicit knowledge within foundation models to enhance the performance in NeSy tasks, whilst reducing the amount of data labelling and manual engineering. We introduce a new architecture, called NeSyGPT, which fine-tunes a vision-language foundation model to extract symbolic features from raw data, before learning a highly expressive answer set program to solve a downstream task. Our comprehensive evaluation demonstrates that NeSyGPT has superior accuracy over various baselines, and can scale to complex NeSy tasks. Finally, we highlight the effective use of a large language model to generate the programmatic interface between the neural and symbolic components, significantly reducing the amount of manual engineering required.
Paper Structure (23 sections, 1 equation, 13 figures, 9 tables)

This paper contains 23 sections, 1 equation, 13 figures, 9 tables.

Figures (13)

  • Figure 1: NeSyGPT architecture with one data point in the Follow Suit task. The goal is to learn the rules of the game: The winner is the player with the highest ranked card with the same suit as Player 1. (a) BLIP is fine-tuned using playing card images and natural language questions and answers. (b) A hypothesis is learned from BLIP predictions. Note in (a), fine-tuning occurs with both suit and rank answers.
  • Figure 2: Task accuracy when the symbolic rules are given.
  • Figure 3: Task accuracy when the symbolic rules are learned.
  • Figure 4: Follow Suit task accuracy when the symbolic rules are given.
  • Figure 5: Follow Suit task accuracy when the symbolic rules are learned.
  • ...and 8 more figures