Table of Contents
Fetching ...

Multi-MLLM Knowledge Distillation for Out-of-Context News Detection

Yimeng Gu, Zhao Tong, Ignacio Castro, Shu Wu, Gareth Tyson

TL;DR

This work tackles the challenge of detecting multimodal out-of-context news with resource-efficient models. It introduces MMKD, a two-stage, multi-teacher knowledge distillation framework that prompts multiple large MLLMs to produce labels and rationales augmented with web evidence, then distills this knowledge into a lightweight student via global knowledge learning and complementary knowledge fusion using LoRA and Direct Preference Optimization. Empirically, MMKD achieves state-of-the-art accuracy on NewsCLIPpings with only 10% of the labeled data and a 7B student, significantly reducing annotation and compute costs. The approach demonstrates that exploiting multi-teacher reasoning and targeted hard-case refinement can yield robust out-of-context detection while maintaining practical deployment efficiency, suggesting strong potential for real-world misinformation filtering in low-resource settings.

Abstract

Multimodal out-of-context news is a type of misinformation in which the image is used outside of its original context. Many existing works have leveraged multimodal large language models (MLLMs) for detecting out-of-context news. However, observing the limited zero-shot performance of smaller MLLMs, they generally require label-rich fine-tuning and/or expensive API calls to GPT models to improve the performance, which is impractical in low-resource scenarios. In contrast, we aim to improve the performance of small MLLMs in a more label-efficient and cost-effective manner. To this end, we first prompt multiple teacher MLLMs to generate both label predictions and corresponding rationales, which collectively serve as the teachers' knowledge. We then introduce a two-stage knowledge distillation framework to transfer this knowledge to a student MLLM. In Stage 1, we apply LoRA fine-tuning to the student model using all training data. In Stage 2, we further fine-tune the student model using both LoRA fine-tuning and DPO on the data points where teachers' predictions conflict. This two-stage strategy reduces annotation costs and helps the student model uncover subtle patterns in more challenging cases. Experimental results demonstrate that our approach achieves state-of-the-art performance using less than 10% labeled data.

Multi-MLLM Knowledge Distillation for Out-of-Context News Detection

TL;DR

This work tackles the challenge of detecting multimodal out-of-context news with resource-efficient models. It introduces MMKD, a two-stage, multi-teacher knowledge distillation framework that prompts multiple large MLLMs to produce labels and rationales augmented with web evidence, then distills this knowledge into a lightweight student via global knowledge learning and complementary knowledge fusion using LoRA and Direct Preference Optimization. Empirically, MMKD achieves state-of-the-art accuracy on NewsCLIPpings with only 10% of the labeled data and a 7B student, significantly reducing annotation and compute costs. The approach demonstrates that exploiting multi-teacher reasoning and targeted hard-case refinement can yield robust out-of-context detection while maintaining practical deployment efficiency, suggesting strong potential for real-world misinformation filtering in low-resource settings.

Abstract

Multimodal out-of-context news is a type of misinformation in which the image is used outside of its original context. Many existing works have leveraged multimodal large language models (MLLMs) for detecting out-of-context news. However, observing the limited zero-shot performance of smaller MLLMs, they generally require label-rich fine-tuning and/or expensive API calls to GPT models to improve the performance, which is impractical in low-resource scenarios. In contrast, we aim to improve the performance of small MLLMs in a more label-efficient and cost-effective manner. To this end, we first prompt multiple teacher MLLMs to generate both label predictions and corresponding rationales, which collectively serve as the teachers' knowledge. We then introduce a two-stage knowledge distillation framework to transfer this knowledge to a student MLLM. In Stage 1, we apply LoRA fine-tuning to the student model using all training data. In Stage 2, we further fine-tune the student model using both LoRA fine-tuning and DPO on the data points where teachers' predictions conflict. This two-stage strategy reduces annotation costs and helps the student model uncover subtle patterns in more challenging cases. Experimental results demonstrate that our approach achieves state-of-the-art performance using less than 10% labeled data.

Paper Structure

This paper contains 29 sections, 7 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: A brief comparison between our proposed approach and previous works. Previous approaches either (a) demands label-rich fine-tuning or (b) makes expensive API calls to proprietary MLLMs. In contrast, our approach only requires a few labels and does not use proprietary MLLMs.
  • Figure 2: The framework of MMKD. It consists of two steps: ( i) Knowledge Acquisition, which prompts teacher MLLMs to obtain predicted labels and corresponding rationales; and ( ii) Multi-Teacher Knowledge Distillation, which LoRA fine-tunes the student MLLM on the acquired teacher knowledge with two stages: Global Knowledge Learning and Complementary Knowledge Fusion.
  • Figure 3: MMKD's performance with different LoRA rank $r$, DPO weight $\alpha$ and sensitivity parameter $\beta$ values.
  • Figure 4: Case 1 of student model's output without and with MMKD.
  • Figure 5: Case 2 of student model's output without and with MMKD.