Probing Neural Topology of Large Language Models
Yu Zheng, Yuan Yuan, Yue Zhuo, Yong Li, Gabriel Kreiman, Tomaso Poggio, Paolo Santi
TL;DR
This work introduces graph probing to study the neural topology of large language models by constructing dynamic connectivity graphs from token-by-token neuron time series and relating them to language generation performance. Using simple linear or MLP probes on flattened adjacency matrices, the authors show that neural topology universally predicts perplexity and semantic representations across model families, often outperforming activation-based probes by large margins, even when only 1% of connections are retained. They provide causal evidence via interventions that hub neurons and a stable default network are functionally leveraged by LLMs, and they demonstrate practical applications in pruning and hallucination detection, as well as domain-specific topology and model fingerprinting. The findings highlight the rich information contained in topology over raw activations, with implications for more efficient, reliable, and interpretable AI systems, and they open avenues for extending graph probing to larger models and multimodal architectures. $\mathrm{PPL}(X) = \exp\left(-\frac{1}{t} \sum_{i=1}^{t} \log p_\theta(x_i \mid x_{<i})\right)$ is used to quantify generation performance, and the approach leverages topology-derived signals to guide pruning and safety improvements.
Abstract
Probing large language models (LLMs) has yielded valuable insights into their internal mechanisms by linking neural activations to interpretable semantics. However, the complex mechanisms that link neuron's functional co-activation with the emergent model capabilities remains largely unknown, hindering a deeper understanding and safer development of LLMs. In this work, we introduce graph probing, a method for uncovering the functional connectivity of LLM neurons and relating it to language generation performance. By probing models across diverse LLM families and scales, we discover a universal predictability of language generation and understanding performance using only neural topology, which persists even when retaining just 1% of neuron connections. Strikingly, probing on topology outperforms probing on activation by up to 130.4% and 67.7% on perplexity and space/time semantic regression respectively, suggesting that neural topology contains orders of richer information of LLM performance than neural activation, which can be easily extracted with simple linear or MLP probes. To explain the dependence between neural topology and language performance, we identify default networks and hub neurons in LLMs and provide causal evidence by interventional experiments on multiple benchmarks, showing that LLMs actually exploit these topological information. Further analyses suggest that graph probing can be effectively leveraged to improve the efficiency and reliability of LLMs through proof-of-concept applications in model pruning and hallucination detection. Codes and data for the graph probing toolbox are available at https://github.com/DavyMorgan/llm-graph-probing.
