Table of Contents
Fetching ...

Abduction of Domain Relationships from Data for VQA

Al Mehdi Saadat Chowdhury, Paulo Shakarian, Gerardo I. Simari

TL;DR

This work tackles VQA when the image and query are represented as ASP programs lacking domain data. It introduces a domain-abduction framework that learns domain relationships via a practical heuristic, FAST-DAP, yielding a set of domain facts $Pi^D$ (assign($A$,$D$)) that, when added to the existing ASP programs, substantially improve question answering accuracy on the GQA dataset from $59.98 ext{%}$ to about $81.0 ext{%}$ using only a small amount of data. The method is orthogonal to knowledge-graph approaches, offering a neurosymbolic, scalable way to inject domain knowledge into reasoning. Key contributions include formalizing the domain abduction problem, proposing a fast and regularized learning algorithm, and demonstrating strong data efficiency and practical performance on real-world data. The work highlights the potential of combining logical representations with abductive inference to enhance VQA under domain uncertainty, while noting the lack of theoretical guarantees and pointing to meta-cognitive AI as a path for future improvement.

Abstract

In this paper, we study the problem of visual question answering (VQA) where the image and query are represented by ASP programs that lack domain data. We provide an approach that is orthogonal and complementary to existing knowledge augmentation techniques where we abduce domain relationships of image constructs from past examples. After framing the abduction problem, we provide a baseline approach, and an implementation that significantly improves the accuracy of query answering yet requires few examples.

Abduction of Domain Relationships from Data for VQA

TL;DR

This work tackles VQA when the image and query are represented as ASP programs lacking domain data. It introduces a domain-abduction framework that learns domain relationships via a practical heuristic, FAST-DAP, yielding a set of domain facts (assign(,)) that, when added to the existing ASP programs, substantially improve question answering accuracy on the GQA dataset from to about using only a small amount of data. The method is orthogonal to knowledge-graph approaches, offering a neurosymbolic, scalable way to inject domain knowledge into reasoning. Key contributions include formalizing the domain abduction problem, proposing a fast and regularized learning algorithm, and demonstrating strong data efficiency and practical performance on real-world data. The work highlights the potential of combining logical representations with abductive inference to enhance VQA under domain uncertainty, while noting the lack of theoretical guarantees and pointing to meta-cognitive AI as a path for future improvement.

Abstract

In this paper, we study the problem of visual question answering (VQA) where the image and query are represented by ASP programs that lack domain data. We provide an approach that is orthogonal and complementary to existing knowledge augmentation techniques where we abduce domain relationships of image constructs from past examples. After framing the abduction problem, we provide a baseline approach, and an implementation that significantly improves the accuracy of query answering yet requires few examples.

Paper Structure

This paper contains 5 sections, 7 equations, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: An image (left) and a section of its corresponding scene graph (right). In the scene graph, square nodes represent objects, oval nodes represent attributes, and solid edges connect objects to attributes. Shaded nodes represent domain knowledge, connected to attributes by dashed edges.
  • Figure 2: Accuracy and running time on different training subsets.

Theorems & Definitions (2)

  • Example 2.1
  • Example 2.2