CheckIfExist: Detecting Citation Hallucinations in the Era of AI-Generated Content
Diletta Abbonato
TL;DR
CheckIfExist tackles the rising problem of citation hallucinations in AI-assisted writing by delivering real-time, multi-source reference validation across CrossRef, Semantic Scholar, and OpenAlex. It employs a cascading verification pipeline that leverages Levenshtein-based string similarity and author-name matching to compute a composite confidence score, e.g., $\text{similarity}(a,b) = 1 - \frac{\text{lev}(a,b)}{\max(|a|,|b|)}$ and $\text{confidence} = \frac{S_{title}+S_{author}+S_{journal}+S_{year}}{4} + \beta_{ms}$, with penalties for mismatches and fake authors. The system preprocessing steps filter LaTeX commands, supports single and batch verification, and outputs APA-formatted citations along with corrected BibTeX records suitable for LaTeX workflows. By releasing under the MIT license and providing unlimited free usage, the solution aims to integrate smoothly into existing scholarly tooling and help safeguard the reliability of scientific literature against AI-generated fabrications.
Abstract
The proliferation of large language models (LLMs) in academic workflows has introduced unprecedented challenges to bibliographic integrity, particularly through reference hallucination -- the generation of plausible but non-existent citations. Recent investigations have documented the presence of AI-hallucinated citations even in papers accepted at premier machine learning conferences such as NeurIPS and ICLR, underscoring the urgency of automated verification mechanisms. This paper presents "CheckIfExist", an open-source web-based tool designed to provide immediate verification of bibliographic references through multi-source validation against CrossRef, Semantic Scholar, and OpenAlex scholarly databases. While existing reference management tools offer bibliographic organization capabilities, they do not provide real-time validation of citation authenticity. Commercial hallucination detection services, though increasingly available, often impose restrictive usage limits on free tiers or require substantial subscription fees. The proposed tool fills this gap by employing a cascading validation architecture with string similarity algorithms to compute multi-dimensional match confidence scores, delivering instant feedback on reference authenticity. The system supports both single-reference verification and batch processing of BibTeX entries through a unified interface, returning validated APA citations and exportable BibTeX records within seconds.
