Table of Contents
Fetching ...

EigenTrack: Spectral Activation Feature Tracking for Hallucination and Out-of-Distribution Detection in LLMs and VLMs

Davide Ettori, Nastaran Darabi, Sina Tayebati, Ranganath Krishnan, Mahesh Subedar, Omesh Tickoo, Amit Ranjan Trivedi

TL;DR

This work tackles hallucination and out-of-distribution errors in large language and vision-language models. It introduces EigenTrack, which converts streaming hidden activations into compact spectral descriptors and tracks their temporal evolution with a lightweight recurrent model. Grounded in Random Matrix Theory, it uses the Marchenko-Pastur baseline and BBP phase transition to detect structure loss, achieving state-of-the-art AUROC across multiple models while enabling early stopping to reduce generation cost. The approach shows strong generalization to multimodal settings and suggests promising directions for tighter theory and adaptive feature selection.

Abstract

Large language models (LLMs) offer broad utility but remain prone to hallucination and out-of-distribution (OOD) errors. We propose EigenTrack, an interpretable real-time detector that uses the spectral geometry of hidden activations, a compact global signature of model dynamics. By streaming covariance-spectrum statistics such as entropy, eigenvalue gaps, and KL divergence from random baselines into a lightweight recurrent classifier, EigenTrack tracks temporal shifts in representation structure that signal hallucination and OOD drift before surface errors appear. Unlike black- and grey-box methods, it needs only a single forward pass without resampling. Unlike existing white-box detectors, it preserves temporal context, aggregates global signals, and offers interpretable accuracy-latency trade-offs.

EigenTrack: Spectral Activation Feature Tracking for Hallucination and Out-of-Distribution Detection in LLMs and VLMs

TL;DR

This work tackles hallucination and out-of-distribution errors in large language and vision-language models. It introduces EigenTrack, which converts streaming hidden activations into compact spectral descriptors and tracks their temporal evolution with a lightweight recurrent model. Grounded in Random Matrix Theory, it uses the Marchenko-Pastur baseline and BBP phase transition to detect structure loss, achieving state-of-the-art AUROC across multiple models while enabling early stopping to reduce generation cost. The approach shows strong generalization to multimodal settings and suggests promising directions for tighter theory and adaptive feature selection.

Abstract

Large language models (LLMs) offer broad utility but remain prone to hallucination and out-of-distribution (OOD) errors. We propose EigenTrack, an interpretable real-time detector that uses the spectral geometry of hidden activations, a compact global signature of model dynamics. By streaming covariance-spectrum statistics such as entropy, eigenvalue gaps, and KL divergence from random baselines into a lightweight recurrent classifier, EigenTrack tracks temporal shifts in representation structure that signal hallucination and OOD drift before surface errors appear. Unlike black- and grey-box methods, it needs only a single forward pass without resampling. Unlike existing white-box detectors, it preserves temporal context, aggregates global signals, and offers interpretable accuracy-latency trade-offs.

Paper Structure

This paper contains 19 sections, 10 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: EigenTrack. Spectral signatures, including entropy, spectral gaps, and divergence from a random matrix baseline, are extracted from intermediate feed-forward layers and streamed into a recurrent spectral discrepancy detector. Tracking their temporal evolution enables early detection of failure.
  • Figure 2: Temporal evolution of spectral features over 80 generation steps on LLaMa-3B for factual and hallucinated sequences.
  • Figure 3: Heatmap (red: high, blue: low) shows the most important features for each classifier on LLaMa 3B and Hallucination dataset, computed by SHAP and normalized. It highlights how different RNNs focus on spectral statistics.
  • Figure 4: (a) Cumulative SHAP attribution mass as a function of the number of top-ranked spectral features for different recurrent architectures on LLaMa-3B (Hallucination dataset). (b) AUROC and inference latency versus sliding-window length for GRU-based hallucination detection on LLaMa-3B (EigenTrack head latency, independent of LLM inference speed). (c) AUROC versus observed response length for GRU-based hallucination detection on LLaMa-3B.