Table of Contents
Fetching ...

Reasoning-Guided Claim Normalization for Noisy Multilingual Social Media Posts

Manan Sharma, Arya Suneesh, Manish Jain, Pawan Kumar Rajpoot, Prasanna Devadiga, Bharatdeep Hazarika, Ashish Shrivastava, Kishan Gurumurthy, Anshuman B Suresh, Aditya U Baliga

TL;DR

The paper tackles multilingual claim normalization for misinformation across 20 languages by fine-tuning a multilingual generator (Qwen3-14B) with LoRA and 4-bit quantization, guided by a structured 5W1H reasoning framework and retrieval-augmented few-shot prompting. It introduces data cleaning via intra-post deduplication and token-level recall filtering, and augments training with explicit What/Who/Where/When/How/Why reasoning. The final English-centric training is evaluated cross-lingually on 13 languages (and zero-shot on 7), using METEOR as a primary metric, with strong cross-lingual transfer observed, especially for Romance languages, and substantial gains from the reasoning and retrieval components. The work demonstrates a scalable approach that maintains semantic coherence across languages and supports integration into fact-checking pipelines, with future work aimed at broader language coverage and platform-generalizable evaluation.

Abstract

We address claim normalization for multilingual misinformation detection - transforming noisy social media posts into clear, verifiable statements across 20 languages. The key contribution demonstrates how systematic decomposition of posts using Who, What, Where, When, Why and How questions enables robust cross-lingual transfer despite training exclusively on English data. Our methodology incorporates finetuning Qwen3-14B using LoRA with the provided dataset after intra-post deduplication, token-level recall filtering for semantic alignment and retrieval-augmented few-shot learning with contextual examples during inference. Our system achieves METEOR scores ranging from 41.16 (English) to 15.21 (Marathi), securing third rank on the English leaderboard and fourth rank for Dutch and Punjabi. The approach shows 41.3% relative improvement in METEOR over baseline configurations and substantial gains over existing methods. Results demonstrate effective cross-lingual generalization for Romance and Germanic languages while maintaining semantic coherence across diverse linguistic structures.

Reasoning-Guided Claim Normalization for Noisy Multilingual Social Media Posts

TL;DR

The paper tackles multilingual claim normalization for misinformation across 20 languages by fine-tuning a multilingual generator (Qwen3-14B) with LoRA and 4-bit quantization, guided by a structured 5W1H reasoning framework and retrieval-augmented few-shot prompting. It introduces data cleaning via intra-post deduplication and token-level recall filtering, and augments training with explicit What/Who/Where/When/How/Why reasoning. The final English-centric training is evaluated cross-lingually on 13 languages (and zero-shot on 7), using METEOR as a primary metric, with strong cross-lingual transfer observed, especially for Romance languages, and substantial gains from the reasoning and retrieval components. The work demonstrates a scalable approach that maintains semantic coherence across languages and supports integration into fact-checking pipelines, with future work aimed at broader language coverage and platform-generalizable evaluation.

Abstract

We address claim normalization for multilingual misinformation detection - transforming noisy social media posts into clear, verifiable statements across 20 languages. The key contribution demonstrates how systematic decomposition of posts using Who, What, Where, When, Why and How questions enables robust cross-lingual transfer despite training exclusively on English data. Our methodology incorporates finetuning Qwen3-14B using LoRA with the provided dataset after intra-post deduplication, token-level recall filtering for semantic alignment and retrieval-augmented few-shot learning with contextual examples during inference. Our system achieves METEOR scores ranging from 41.16 (English) to 15.21 (Marathi), securing third rank on the English leaderboard and fourth rank for Dutch and Punjabi. The approach shows 41.3% relative improvement in METEOR over baseline configurations and substantial gains over existing methods. Results demonstrate effective cross-lingual generalization for Romance and Germanic languages while maintaining semantic coherence across diverse linguistic structures.

Paper Structure

This paper contains 18 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: System workflow for multilingual claim normalization using fine-tuned Qwen3-14B with 5W1H reasoning framework and retrieval-augmented few-shot prompting.
  • Figure 2: Progressive improvement in claim normalization quality across three configurations, showing the impact of 5W1H reasoning and few-shot retrieval on output coherence and conciseness. More examples in Appendix \ref{['app:configexamples']}
  • Figure 3: More examples for illustrating progressive enhancement through 5W1H reasoning and retrieval