Table of Contents
Fetching ...

HausaNLP at SemEval-2025 Task 3: Towards a Fine-Grained Model-Aware Hallucination Detection

Maryam Bala, Amina Imam Abubakar, Abdulhamid Abubakar, Abdulkadir Shehu Bichi, Hafsa Kabir Ahmad, Sani Abdullahi Sani, Idris Abdulmumin, Shamsuddeen Hassan Muhamad, Ibrahim Said Ahmad

TL;DR

The paper tackles fine-grained detection of hallucinations in LLM outputs within a multilingual Mu-SHROOM framework, focusing on English and model-aware analysis. It trains a token-level detector by fine-tuning ModernBERT on a 400-sample synthetic dataset and leveraging internal activations inspired by SAPLMA to predict hallucination spans, yielding hard and soft labels. The model reports IoU of 0.032 and a Correlation Score of 0.422, with precision 0.49, recall 0.54, and F1 0.43, indicating modest boundary accuracy but a meaningful correlation between confidence and hallucination presence. The results illustrate the complexity of span-boundary detection and demonstrate the viability of synthetic data and model-aware methods as a baseline for future improvements in reliable, multilingual hallucination detection.

Abstract

This paper presents our findings of the Multilingual Shared Task on Hallucinations and Related Observable Overgeneration Mistakes, MU-SHROOM, which focuses on identifying hallucinations and related overgeneration errors in large language models (LLMs). The shared task involves detecting specific text spans that constitute hallucinations in the outputs generated by LLMs in 14 languages. To address this task, we aim to provide a nuanced, model-aware understanding of hallucination occurrences and severity in English. We used natural language inference and fine-tuned a ModernBERT model using a synthetic dataset of 400 samples, achieving an Intersection over Union (IoU) score of 0.032 and a correlation score of 0.422. These results indicate a moderately positive correlation between the model's confidence scores and the actual presence of hallucinations. The IoU score indicates that our model has a relatively low overlap between the predicted hallucination span and the truth annotation. The performance is unsurprising, given the intricate nature of hallucination detection. Hallucinations often manifest subtly, relying on context, making pinpointing their exact boundaries formidable.

HausaNLP at SemEval-2025 Task 3: Towards a Fine-Grained Model-Aware Hallucination Detection

TL;DR

The paper tackles fine-grained detection of hallucinations in LLM outputs within a multilingual Mu-SHROOM framework, focusing on English and model-aware analysis. It trains a token-level detector by fine-tuning ModernBERT on a 400-sample synthetic dataset and leveraging internal activations inspired by SAPLMA to predict hallucination spans, yielding hard and soft labels. The model reports IoU of 0.032 and a Correlation Score of 0.422, with precision 0.49, recall 0.54, and F1 0.43, indicating modest boundary accuracy but a meaningful correlation between confidence and hallucination presence. The results illustrate the complexity of span-boundary detection and demonstrate the viability of synthetic data and model-aware methods as a baseline for future improvements in reliable, multilingual hallucination detection.

Abstract

This paper presents our findings of the Multilingual Shared Task on Hallucinations and Related Observable Overgeneration Mistakes, MU-SHROOM, which focuses on identifying hallucinations and related overgeneration errors in large language models (LLMs). The shared task involves detecting specific text spans that constitute hallucinations in the outputs generated by LLMs in 14 languages. To address this task, we aim to provide a nuanced, model-aware understanding of hallucination occurrences and severity in English. We used natural language inference and fine-tuned a ModernBERT model using a synthetic dataset of 400 samples, achieving an Intersection over Union (IoU) score of 0.032 and a correlation score of 0.422. These results indicate a moderately positive correlation between the model's confidence scores and the actual presence of hallucinations. The IoU score indicates that our model has a relatively low overlap between the predicted hallucination span and the truth annotation. The performance is unsurprising, given the intricate nature of hallucination detection. Hallucinations often manifest subtly, relying on context, making pinpointing their exact boundaries formidable.

Paper Structure

This paper contains 11 sections.