ANHALTEN: Cross-Lingual Transfer for German Token-Level Reference-Free Hallucination Detection

Janek Herrlein; Chia-Chien Hung; Goran Glavaš

ANHALTEN: Cross-Lingual Transfer for German Token-Level Reference-Free Hallucination Detection

Janek Herrlein, Chia-Chien Hung, Goran Glavaš

TL;DR

This work introduces ANHALTEN, a German extension of the English HaDes benchmark for token-level, reference-free hallucination detection, enabling parallel evaluation and cross-lingual transfer studies. It uses a two-phase translation process to create a high-quality German dataset and investigates three cross-lingual transfer strategies—Zero-Shot, Few-Shot, and Translate-Train—within an adapter-based transfer framework on multilingual PLMs like mBERT and XLM-R. Key findings show that larger context improves online detection, and few-shot transfer is the most cost-effective method to achieve strong performance, with Translate-Train offering additional gains. The results advance multilingual hallucination detection research and support real-time, language-appropriate safeguards for free-form text generation, with the ANHALTEN dataset publicly available for future work.

Abstract

Research on token-level reference-free hallucination detection has predominantly focused on English, primarily due to the scarcity of robust datasets in other languages. This has hindered systematic investigations into the effectiveness of cross-lingual transfer for this important NLP application. To address this gap, we introduce ANHALTEN, a new evaluation dataset that extends the English hallucination detection dataset to German. To the best of our knowledge, this is the first work that explores cross-lingual transfer for token-level reference-free hallucination detection. ANHALTEN contains gold annotations in German that are parallel (i.e., directly comparable to the original English instances). We benchmark several prominent cross-lingual transfer approaches, demonstrating that larger context length leads to better hallucination detection in German, even without succeeding context. Importantly, we show that the sample-efficient few-shot transfer is the most effective approach in most setups. This highlights the practical benefits of minimal annotation effort in the target language for reference-free hallucination detection. Aiming to catalyze future research on cross-lingual token-level reference-free hallucination detection, we make ANHALTEN publicly available: https://github.com/janekh24/anhalten

ANHALTEN: Cross-Lingual Transfer for German Token-Level Reference-Free Hallucination Detection

TL;DR

Abstract

ANHALTEN: Cross-Lingual Transfer for German Token-Level Reference-Free Hallucination Detection

Authors

TL;DR

Abstract

Table of Contents