Table of Contents
Fetching ...

Zero-shot Commonsense Reasoning over Machine Imagination

Hyuntae Park, Yeachan Kim, Jun-Hyung Park, SangKeun Lee

TL;DR

This work proposes Imagine (Machine Imagination-based Reasoning), a novel zero-shot commonsense reasoning framework designed to complement textual inputs with visual signals derived from machine-generated images, highlighting the strength of machine imagination in mitigating reporting bias and enhancing generalization capabilities.

Abstract

Recent approaches to zero-shot commonsense reasoning have enabled Pre-trained Language Models (PLMs) to learn a broad range of commonsense knowledge without being tailored to specific situations. However, they often suffer from human reporting bias inherent in textual commonsense knowledge, leading to discrepancies in understanding between PLMs and humans. In this work, we aim to bridge this gap by introducing an additional information channel to PLMs. We propose Imagine (Machine Imagination-based Reasoning), a novel zero-shot commonsense reasoning framework designed to complement textual inputs with visual signals derived from machine-generated images. To achieve this, we enhance PLMs with imagination capabilities by incorporating an image generator into the reasoning process. To guide PLMs in effectively leveraging machine imagination, we create a synthetic pre-training dataset that simulates visual question-answering. Our extensive experiments on diverse reasoning benchmarks and analysis show that Imagine outperforms existing methods by a large margin, highlighting the strength of machine imagination in mitigating reporting bias and enhancing generalization capabilities.

Zero-shot Commonsense Reasoning over Machine Imagination

TL;DR

This work proposes Imagine (Machine Imagination-based Reasoning), a novel zero-shot commonsense reasoning framework designed to complement textual inputs with visual signals derived from machine-generated images, highlighting the strength of machine imagination in mitigating reporting bias and enhancing generalization capabilities.

Abstract

Recent approaches to zero-shot commonsense reasoning have enabled Pre-trained Language Models (PLMs) to learn a broad range of commonsense knowledge without being tailored to specific situations. However, they often suffer from human reporting bias inherent in textual commonsense knowledge, leading to discrepancies in understanding between PLMs and humans. In this work, we aim to bridge this gap by introducing an additional information channel to PLMs. We propose Imagine (Machine Imagination-based Reasoning), a novel zero-shot commonsense reasoning framework designed to complement textual inputs with visual signals derived from machine-generated images. To achieve this, we enhance PLMs with imagination capabilities by incorporating an image generator into the reasoning process. To guide PLMs in effectively leveraging machine imagination, we create a synthetic pre-training dataset that simulates visual question-answering. Our extensive experiments on diverse reasoning benchmarks and analysis show that Imagine outperforms existing methods by a large margin, highlighting the strength of machine imagination in mitigating reporting bias and enhancing generalization capabilities.

Paper Structure

This paper contains 36 sections, 9 equations, 9 figures, 18 tables.

Figures (9)

  • Figure 1: Example from the PIQA DBLP:conf/aaai/BiskZLGC20-piqa with model predictions. Compared to the existing methods, Imagine performs reasoning with imagination.
  • Figure 2: Overall procedures for (a) constructing a Synthetic VQA dataset and (b) the inference/optimization phase of Imagine (ours) using the given QA pair. The process starts with the textual pair consisting of a question and its answers, followed by the generation of visual signals (i.e., imagination) conditioned on the question. The two distinct features from visual and textual models are then utilized to derive a comprehensive prediction.
  • Figure 3: Examples of the Synthetic VQA dataset. The examples on the left are sourced from AbstractATOMIC DBLP:conf/emnlp/WangF0XLSB23-car, while the two examples on the right are sourced from VCR DBLP:conf/cvpr/ZellersBFC19-vcr. Bold indicates the correct answer, and underline denotes the generated image caption.
  • Figure 4: Comparison of model predictions and the correctness from Imagine and the existing model DBLP:conf/emnlp/WangF0XLSB23-car on five commonsense reasoning tasks.
  • Figure 5: Comparison of generated images. The sentences are the queries used to generate the images.
  • ...and 4 more figures