Exploring How LLMs Capture and Represent Domain-Specific Knowledge
Mirian Hipolito Garcia, Camille Couturier, Daniel Madrigal Diaz, Ankur Mallick, Anastasios Kyrillidis, Robert Sim, Victor Ruhle, Saravan Rajmohan
TL;DR
The paper investigates whether hidden states in large language models inherently encode domain-specific knowledge that can be used for domain-aware routing and model selection. By analyzing hidden-state activity during the prefill phase across multiple autoregressive LLMs and a DeBERTa encoder, the authors identify latent domain-related trajectories that consistently separate queries from Maths, Biomedical, Law, and Humanities domains, even under prompt variations. They demonstrate that a Hidden States Classifier, trained on these activations, can outperform semantic routing and domain-finetuned baselines, with robust performance on open-ended tasks and cross-domain generalization. The work highlights deeper-layer representations as robust signals for domain context, offering a path toward unsupervised model selection and improved interpretability in cross-domain generation scenarios. Limitations include the focus on smaller models and potential domain-trace mixing, suggesting future work to extend to larger models and broader domains.
Abstract
We study whether Large Language Models (LLMs) inherently capture domain-specific nuances in natural language. Our experiments probe the domain sensitivity of LLMs by examining their ability to distinguish queries from different domains using hidden states generated during the prefill phase. We reveal latent domain-related trajectories that indicate the model's internal recognition of query domains. We also study the robustness of these domain representations to variations in prompt styles and sources. Our approach leverages these representations for model selection, mapping the LLM that best matches the domain trace of the input query (i.e., the model with the highest performance on similar traces). Our findings show that LLMs can differentiate queries for related domains, and that the fine-tuned model is not always the most accurate. Unlike previous work, our interpretations apply to both closed and open-ended generative tasks
