CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models
Yongheng Zhang, Xu Liu, Ruoxi Zhou, Qiguang Chen, Hao Fei, Wenpeng Lu, Libo Qin
TL;DR
CCHall introduces the first benchmark to evaluate joint cross-lingual and cross-modal hallucinations in large language models, addressing a gap where prior work treated these settings separately. It combines raw multimodal data with cross-modal, cross-lingual, and joint hallucination data across VQA and image captioning tasks, including translations into multiple resource-level languages and human verification. The paper reports that current MLLMs remain far from robust on CCHall, though methods like UniHD with external tools and multilingual prompts provide meaningful gains. By providing open data and code, CCHall aims to drive progress in reducing joint hallucinations and improving the reliability of multimodal, multilingual LLM systems in real-world deployments.
Abstract
Investigating hallucination issues in large language models (LLMs) within cross-lingual and cross-modal scenarios can greatly advance the large-scale deployment in real-world applications. Nevertheless, the current studies are limited to a single scenario, either cross-lingual or cross-modal, leaving a gap in the exploration of hallucinations in the joint cross-lingual and cross-modal scenarios. Motivated by this, we introduce a novel joint Cross-lingual and Cross-modal Hallucinations benchmark (CCHall) to fill this gap. Specifically, CCHall simultaneously incorporates both cross-lingual and cross-modal hallucination scenarios, which can be used to assess the cross-lingual and cross-modal capabilities of LLMs. Furthermore, we conduct a comprehensive evaluation on CCHall, exploring both mainstream open-source and closed-source LLMs. The experimental results highlight that current LLMs still struggle with CCHall. We hope CCHall can serve as a valuable resource to assess LLMs in joint cross-lingual and cross-modal scenarios.
