EigenTrack: Spectral Activation Feature Tracking for Hallucination and Out-of-Distribution Detection in LLMs and VLMs
Davide Ettori, Nastaran Darabi, Sina Tayebati, Ranganath Krishnan, Mahesh Subedar, Omesh Tickoo, Amit Ranjan Trivedi
TL;DR
This work tackles hallucination and out-of-distribution errors in large language and vision-language models. It introduces EigenTrack, which converts streaming hidden activations into compact spectral descriptors and tracks their temporal evolution with a lightweight recurrent model. Grounded in Random Matrix Theory, it uses the Marchenko-Pastur baseline and BBP phase transition to detect structure loss, achieving state-of-the-art AUROC across multiple models while enabling early stopping to reduce generation cost. The approach shows strong generalization to multimodal settings and suggests promising directions for tighter theory and adaptive feature selection.
Abstract
Large language models (LLMs) offer broad utility but remain prone to hallucination and out-of-distribution (OOD) errors. We propose EigenTrack, an interpretable real-time detector that uses the spectral geometry of hidden activations, a compact global signature of model dynamics. By streaming covariance-spectrum statistics such as entropy, eigenvalue gaps, and KL divergence from random baselines into a lightweight recurrent classifier, EigenTrack tracks temporal shifts in representation structure that signal hallucination and OOD drift before surface errors appear. Unlike black- and grey-box methods, it needs only a single forward pass without resampling. Unlike existing white-box detectors, it preserves temporal context, aggregates global signals, and offers interpretable accuracy-latency trade-offs.
