Multi-source Knowledge Enhanced Graph Attention Networks for Multimodal Fact Verification
Han Cao, Lingwei Wei, Wei Zhou, Songlin Hu
TL;DR
This work tackles multimodal fact verification by integrating fine-grained knowledge from multiple sources. It introduces MultiKE-GAT, which builds an undirected heterogeneous graph from textual entities, visual entities, and key information, and applies a Knowledge-aware Graph Fusion module to suppress noise via global guidance. The model updates node representations with a graph-attention mechanism in a shared latent space, then aggregates features for final verification. Empirical results on FACTIFY and MOCHEG show state-of-the-art performance, with ablations confirming the importance of multi-source knowledge, the fusion module, and global guidance. The approach offers a robust framework for incorporating diverse knowledge to improve cross-modal reasoning and veracity judgments in multimodal claims.
Abstract
Multimodal fact verification is an under-explored and emerging field that has gained increasing attention in recent years. The goal is to assess the veracity of claims that involve multiple modalities by analyzing the retrieved evidence. The main challenge in this area is to effectively fuse features from different modalities to learn meaningful multimodal representations. To this end, we propose a novel model named Multi-Source Knowledge-enhanced Graph Attention Network (MultiKE-GAT). MultiKE-GAT introduces external multimodal knowledge from different sources and constructs a heterogeneous graph to capture complex cross-modal and cross-source interactions. We exploit a Knowledge-aware Graph Fusion (KGF) module to learn knowledge-enhanced representations for each claim and evidence and eliminate inconsistencies and noises introduced by redundant entities. Experiments on two public benchmark datasets demonstrate that our model outperforms other comparison methods, showing the effectiveness and superiority of the proposed model.
