MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection
Kaiying Yan, Moyang Liu, Yukun Liu, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Xuefei Liu, Guanjun Li
TL;DR
The paper tackles multimodal fake news detection, where conflicting optimization signals across text, audio, and visuals hinder effective fusion. It introduces MTPareto, a hierarchical fusion framework that employs Targeted Pareto gradient integration to align objectives across three fusion levels, prioritizing all-modal information while controlling gradient conflicts. Key contributions include the three-level fusion architecture with attention-based modules, the TPareto training scheme with level-specific losses and gradient rules, and extensive experiments on FakeSV and FVC showing consistent accuracy gains and informative ablations. The approach demonstrates robust all-modal fusion and provides a transferable methodology for other multimodal tasks that face inter-modal optimization conflicts.
Abstract
Multimodal fake news detection is essential for maintaining the authenticity of Internet multimedia information. Significant differences in form and content of multimodal information lead to intensified optimization conflicts, hindering effective model training as well as reducing the effectiveness of existing fusion methods for bimodal. To address this problem, we propose the MTPareto framework to optimize multimodal fusion, using a Targeted Pareto(TPareto) optimization algorithm for fusion-level-specific objective learning with a certain focus. Based on the designed hierarchical fusion network, the algorithm defines three fusion levels with corresponding losses and implements all-modal-oriented Pareto gradient integration for each. This approach accomplishes superior multimodal fusion by utilizing the information obtained from intermediate fusion to provide positive effects to the entire process. Experiment results on FakeSV and FVC datasets show that the proposed framework outperforms baselines and the TPareto optimization algorithm achieves 2.40% and 1.89% accuracy improvement respectively.
