VILLAIN at AVerImaTeC: Verifying Image-Text Claims via Multi-Agent Collaboration

Jaeyoon Jung; Yejun Yoon; Seunghyun Yoon; Kunwoo Park

VILLAIN at AVerImaTeC: Verifying Image-Text Claims via Multi-Agent Collaboration

Jaeyoon Jung, Yejun Yoon, Seunghyun Yoon, Kunwoo Park

TL;DR

This paper describes VILLAIN, a multimodal fact-checking system that verifies image-text claims through prompt-based multi-agent collaboration and ranked first on the leaderboard across all evaluation metrics.

Abstract

This paper describes VILLAIN, a multimodal fact-checking system that verifies image-text claims through prompt-based multi-agent collaboration. For the AVerImaTeC shared task, VILLAIN employs vision-language model agents across multiple stages of fact-checking. Textual and visual evidence is retrieved from the knowledge store enriched through additional web collection. To identify key information and address inconsistencies among evidence items, modality-specific and cross-modal agents generate analysis reports. In the subsequent stage, question-answer pairs are produced based on these reports. Finally, the Verdict Prediction agent produces the verification outcome based on the image-text claim and the generated question-answer pairs. Our system ranked first on the leaderboard across all evaluation metrics. The source code is publicly available at https://github.com/ssu-humane/VILLAIN.

VILLAIN at AVerImaTeC: Verifying Image-Text Claims via Multi-Agent Collaboration

TL;DR

Abstract

Paper Structure (27 sections, 3 equations, 6 figures, 6 tables)

This paper contains 27 sections, 3 equations, 6 figures, 6 tables.

Introduction
Related Works
Task Description
Our System
Evidence Retrieval
Textual Evidence Retrieval
Visual Evidence Retrieval
Evidence Analysis
Question--Answer Generation
Verdict Prediction
Evaluation Experiments
Experimental Setups
Experimental Results
Development Set Results
Test Set Results
...and 12 more sections

Figures (6)

Figure 1: Overview of VILLAIN.
Figure 2: Prompt used for $\mathcal{A}_{TT}$ with an example of the input and output. Blue text indicates actual model output.
Figure 3: Prompt used for $\mathcal{A}_{IT}$ with an example of the input and output. Blue text indicates actual model output.
Figure 4: Prompt used for $\mathcal{A}_{CM}$ with an example of the input and output. Blue text indicates actual model output.
Figure 5: Prompt used for $\mathcal{A}_{QA}$ with an example of the input and output. Blue text indicates actual model output in JSON format.
...and 1 more figures

VILLAIN at AVerImaTeC: Verifying Image-Text Claims via Multi-Agent Collaboration

TL;DR

Abstract

VILLAIN at AVerImaTeC: Verifying Image-Text Claims via Multi-Agent Collaboration

Authors

TL;DR

Abstract

Table of Contents

Figures (6)