Table of Contents
Fetching ...

D3: Data Diversity Design for Systematic Generalization in Visual Question Answering

Amir Rahimi, Vanessa D'Amario, Moyuru Yamada, Kentaro Takemoto, Tomotake Sasaki, Xavier Boix

TL;DR

New evidence in the problem of Visual Question Answering (VQA) is presented that reveals that the diversity of simple tasks plays a key role in achieving systematic generalization, implying that it may not be essential to gather a large and varied number of complex tasks, which could be costly to obtain.

Abstract

Systematic generalization is a crucial aspect of intelligence, which refers to the ability to generalize to novel tasks by combining known subtasks and concepts. One critical factor that has been shown to influence systematic generalization is the diversity of training data. However, diversity can be defined in various ways, as data have many factors of variation. A more granular understanding of how different aspects of data diversity affect systematic generalization is lacking. We present new evidence in the problem of Visual Question Answering (VQA) that reveals that the diversity of simple tasks (i.e. tasks formed by a few subtasks and concepts) plays a key role in achieving systematic generalization. This implies that it may not be essential to gather a large and varied number of complex tasks, which could be costly to obtain. We demonstrate that this result is independent of the similarity between the training and testing data and applies to well-known families of neural network architectures for VQA (i.e. monolithic architectures and neural module networks). Additionally, we observe that neural module networks leverage all forms of data diversity we evaluated, while monolithic architectures require more extensive amounts of data to do so. These findings provide a first step towards understanding the interactions between data diversity design, neural network architectures, and systematic generalization capabilities.

D3: Data Diversity Design for Systematic Generalization in Visual Question Answering

TL;DR

New evidence in the problem of Visual Question Answering (VQA) is presented that reveals that the diversity of simple tasks plays a key role in achieving systematic generalization, implying that it may not be essential to gather a large and varied number of complex tasks, which could be costly to obtain.

Abstract

Systematic generalization is a crucial aspect of intelligence, which refers to the ability to generalize to novel tasks by combining known subtasks and concepts. One critical factor that has been shown to influence systematic generalization is the diversity of training data. However, diversity can be defined in various ways, as data have many factors of variation. A more granular understanding of how different aspects of data diversity affect systematic generalization is lacking. We present new evidence in the problem of Visual Question Answering (VQA) that reveals that the diversity of simple tasks (i.e. tasks formed by a few subtasks and concepts) plays a key role in achieving systematic generalization. This implies that it may not be essential to gather a large and varied number of complex tasks, which could be costly to obtain. We demonstrate that this result is independent of the similarity between the training and testing data and applies to well-known families of neural network architectures for VQA (i.e. monolithic architectures and neural module networks). Additionally, we observe that neural module networks leverage all forms of data diversity we evaluated, while monolithic architectures require more extensive amounts of data to do so. These findings provide a first step towards understanding the interactions between data diversity design, neural network architectures, and systematic generalization capabilities.
Paper Structure (24 sections, 10 figures, 8 tables)

This paper contains 24 sections, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Data diversity and its impact on systematic generalization in VQA. (top) Impact of question complexity distribution on systematic generalization. In our study, question complexity is varied in two aspects: (middle) Attribute composition. In this example, the training set (in-distribution) contains Count and Exist question with different compositions (left) while Count or Exist questions have novel combinations of attributes in the test set (out-of-distribution). (bottom) Length. The in-distribution question has shorter length (different syntactic structure) compared to the out-of-distribution question.
  • Figure 2: Specification of different biases for train and test datasets. The gray boxes display the question type, while the valid attributes corresponding to each question type are shown in white boxes below. A valid example question for each dataset is shown below the boxes.
  • Figure 3: Change in accuracy after replacing $30\%$ of base training questions ( 2-Hop A) with different type of questions as D3 to have more diversity. In most cases diversity helps systematic generalization significantly.
  • Figure 4: Qualitative results. The questions and ground truth answers are displayed in the top line above each image, while the predictions of MAC using 2-Hop A, D3(+1-Hop A), D3(+1-Hop B), D3(+1-Hop (Full))) are presented from left to right in the second line above each image.
  • Figure 5: Change in accuracy after applying D3 on 2-Hop A dataset using the MiniGPT-v2 model.
  • ...and 5 more figures