Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning

Dongmin Park; Zhaofang Qian; Guangxing Han; Ser-Nam Lim

Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning

Dongmin Park, Zhaofang Qian, Guangxing Han, Ser-Nam Lim

TL;DR

An evaluation benchmark is presented by extending popular multi-modal benchmark datasets with prepended hallucinatory dialogues powered by the novel Adversarial Question Generator (AQG), which can automatically generate image-related yet adversarial dialogues by adopting adversarial attacks on LVLMs.

Abstract

Mitigating hallucinations of Large Vision Language Models,(LVLMs) is crucial to enhance their reliability for general-purpose assistants. This paper shows that such hallucinations of LVLMs can be significantly exacerbated by preceding user-system dialogues. To precisely measure this, we first present an evaluation benchmark by extending popular multi-modal benchmark datasets with prepended hallucinatory dialogues powered by our novel Adversarial Question Generator (AQG), which can automatically generate image-related yet adversarial dialogues by adopting adversarial attacks on LVLMs. On our benchmark, the zero-shot performance of state-of-the-art LVLMs drops significantly for both the VQA and Captioning tasks. Next, we further reveal this hallucination is mainly due to the prediction bias toward preceding dialogues rather than visual content. To reduce this bias, we propose Adversarial Instruction Tuning (AIT) that robustly fine-tunes LVLMs against hallucinatory dialogues. Extensive experiments show our proposed approach successfully reduces dialogue hallucination while maintaining performance.

Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning

TL;DR

Abstract

Paper Structure (29 sections, 4 equations, 13 figures, 13 tables, 1 algorithm)

This paper contains 29 sections, 4 equations, 13 figures, 13 tables, 1 algorithm.

Introduction
Related Work
Instruction-following LVLMs
Hallucinations of LVLMs
Adversarial Attacks on Language Models
Dialogue Hallucination and Evaluation Benchmark
Dialogue Hallucination of LVLMs
EvalDial: An Evaluation Benchmark
Adversarial Question Generator
Adversarial Instruction Tuning
Input Token Attention Analysis
Adversarial Instruction Tuning (AIT)
Experiments
Main Results on EvalDial
In-depth Analysis of AQG
...and 14 more sections

Figures (13)

Figure 1: (a) shows an example of dialogue hallucination generated by an LVLM (e.g., LLaVA liu2023visual) for a test example in ScienceQA dataset; (b) shows the average performance drop of LLaVA and AIT on EvalDial for VQA and Captioning tasks with prepended adversarial dialogues.
Figure 1: Summary of DTAR scores for correct (non-hallucinated) and hallucinated cases.
Figure 2: Overview of dialogue hallucinations on EvalDial. A test example on ScienceQA that LLaVA originally answers correctly becomes hallucinated after three types of prepended dialogues, i.e., General, Random, and Adversarial.
Figure 3: shows the overview of AQG, generating an adversarial dialogue $\text{X}^{adv}_\texttt{dialogue}$ (in yellow box) to hallucinate the answer $\text{X}^{adv}_\texttt{a}$ (in green box) by incorporating an extra LVLM into the optimization of adversarial attack
Figure 3: Comparison of AQG with different attacking methods. The lower, the more effective in attacking.
...and 8 more figures

Theorems & Definitions (2)

Definition 3.1
Definition 4.1

Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning

TL;DR

Abstract

Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning

Authors

TL;DR

Abstract

Table of Contents

Figures (13)

Theorems & Definitions (2)