SemanticCite: Citation Verification with AI-Powered Full-Text Analysis and Evidence-Based Reasoning
Sebastian Haan
TL;DR
SemanticCite introduces a full-text, AI-powered citation verification framework that moves beyond abstract-level checks by combining a hybrid retrieval pipeline with a four-class taxonomy (SUPPORTED, PARTIALLY SUPPORTED, UNSUPPORTED, UNCERTAIN) and evidence-based reasoning. It demonstrates that fine-tuned lightweight models (Qwen3 variants) can achieve competitive performance with significantly reduced computation, while providing transparent explanations and ranked textual evidence. The work provides a 1,111-citation dataset across eight disciplines, open-source software, and an end-to-end pipeline including a Streamlit interface for practical deployment. The approach promises scalable, interpretable improvements in research integrity, peer review efficiency, and AI-generated content quality control, with clear paths for multilingual, multimodal, and multi-reference extensions.
Abstract
Effective scientific communication depends on accurate citations that validate sources and guide readers to supporting evidence. Yet academic literature faces mounting challenges: semantic citation errors that misrepresent sources, AI-generated hallucinated references, and traditional citation formats that point to entire papers without indicating which sections substantiate specific claims. We introduce SemanticCite, an AI-powered system that verifies citation accuracy through full-text source analysis while providing rich contextual information via detailed reasoning and relevant text snippets. Our approach combines multiple retrieval methods with a four-class classification system (Supported, Partially Supported, Unsupported, Uncertain) that captures nuanced claim-source relationships and enables appropriate remedial actions for different error types. Our experiments show that fine-tuned lightweight language models achieve performance comparable to large commercial systems with significantly lower computational requirements, making large-scale citation verification practically feasible. The system provides transparent, evidence-based explanations that support user understanding and trust. We contribute a comprehensive dataset of over 1,000 citations with detailed alignments, functional classifications, semantic annotations, and bibliometric metadata across eight disciplines, alongside fine-tuned models and the complete verification framework as open-source software. SemanticCite addresses critical challenges in research integrity through scalable citation verification, streamlined peer review, and quality control for AI-generated content, providing an open-source foundation for maintaining citation accuracy at scale.
