Table of Contents
Fetching ...

Multi-source Knowledge Enhanced Graph Attention Networks for Multimodal Fact Verification

Han Cao, Lingwei Wei, Wei Zhou, Songlin Hu

TL;DR

This work tackles multimodal fact verification by integrating fine-grained knowledge from multiple sources. It introduces MultiKE-GAT, which builds an undirected heterogeneous graph from textual entities, visual entities, and key information, and applies a Knowledge-aware Graph Fusion module to suppress noise via global guidance. The model updates node representations with a graph-attention mechanism in a shared latent space, then aggregates features for final verification. Empirical results on FACTIFY and MOCHEG show state-of-the-art performance, with ablations confirming the importance of multi-source knowledge, the fusion module, and global guidance. The approach offers a robust framework for incorporating diverse knowledge to improve cross-modal reasoning and veracity judgments in multimodal claims.

Abstract

Multimodal fact verification is an under-explored and emerging field that has gained increasing attention in recent years. The goal is to assess the veracity of claims that involve multiple modalities by analyzing the retrieved evidence. The main challenge in this area is to effectively fuse features from different modalities to learn meaningful multimodal representations. To this end, we propose a novel model named Multi-Source Knowledge-enhanced Graph Attention Network (MultiKE-GAT). MultiKE-GAT introduces external multimodal knowledge from different sources and constructs a heterogeneous graph to capture complex cross-modal and cross-source interactions. We exploit a Knowledge-aware Graph Fusion (KGF) module to learn knowledge-enhanced representations for each claim and evidence and eliminate inconsistencies and noises introduced by redundant entities. Experiments on two public benchmark datasets demonstrate that our model outperforms other comparison methods, showing the effectiveness and superiority of the proposed model.

Multi-source Knowledge Enhanced Graph Attention Networks for Multimodal Fact Verification

TL;DR

This work tackles multimodal fact verification by integrating fine-grained knowledge from multiple sources. It introduces MultiKE-GAT, which builds an undirected heterogeneous graph from textual entities, visual entities, and key information, and applies a Knowledge-aware Graph Fusion module to suppress noise via global guidance. The model updates node representations with a graph-attention mechanism in a shared latent space, then aggregates features for final verification. Empirical results on FACTIFY and MOCHEG show state-of-the-art performance, with ablations confirming the importance of multi-source knowledge, the fusion module, and global guidance. The approach offers a robust framework for incorporating diverse knowledge to improve cross-modal reasoning and veracity judgments in multimodal claims.

Abstract

Multimodal fact verification is an under-explored and emerging field that has gained increasing attention in recent years. The goal is to assess the veracity of claims that involve multiple modalities by analyzing the retrieved evidence. The main challenge in this area is to effectively fuse features from different modalities to learn meaningful multimodal representations. To this end, we propose a novel model named Multi-Source Knowledge-enhanced Graph Attention Network (MultiKE-GAT). MultiKE-GAT introduces external multimodal knowledge from different sources and constructs a heterogeneous graph to capture complex cross-modal and cross-source interactions. We exploit a Knowledge-aware Graph Fusion (KGF) module to learn knowledge-enhanced representations for each claim and evidence and eliminate inconsistencies and noises introduced by redundant entities. Experiments on two public benchmark datasets demonstrate that our model outperforms other comparison methods, showing the effectiveness and superiority of the proposed model.
Paper Structure (14 sections, 5 equations, 3 figures, 5 tables)

This paper contains 14 sections, 5 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Examples of the effectiveness of entities in multimodal fact verification. Support means the entailment of claim and evidence. Multimodal means similar multimodal contents.
  • Figure 2: The overall architecture of MultiKE-GAT. First, multi-source knowledge such as textual entities, visual objects, and keyphrases are extracted from texts and images, forming an undirected heterogeneous graph. The knowledge-oriented graph fusion network fuses diverse fine-grained knowledge with the guidance of global nodes to learn multimodal representations. We use the representation as input to the MLP-Classifier for verification.
  • Figure 3: Two cases of FACTIFY. Red rectangles demonstrate objects and bold words are entities and key information.