Demystifying Network Foundation Models
Sylee Beltiukov, Satyandra Guthula, Wenbo Guo, Walter Willinger, Arpit Gupta
TL;DR
This work tackles the challenge of understanding what Network Foundation Models (NFMs) actually learn in their latent representations, rather than just how well they perform on downstream tasks. It introduces an intrinsic evaluation framework comprising Embedding Geometry Analysis, Metric Alignment Assessment, and Causal Sensitivity Testing to probe representation quality in frozen NFMs. Across four state-of-the-art NFMs and five network datasets, the study reveals pervasive anisotropy, architecture-dependent alignment with domain metrics, and sensitivity to payload and context, highlighting limitations that are invisible to task-based evaluation. By showing that addressing representation issues can yield meaningful performance gains without architectural changes, the paper provides a principled approach to designing more generalizable and robust NFMs for self-driving network applications.
Abstract
This work presents a systematic investigation into the latent knowledge encoded within Network Foundation Models (NFMs) that focuses on hidden representations analysis rather than pure downstream task performance. Different from existing efforts, we analyze the models through a three-part evaluation: Embedding Geometry Analysis to assess representation space utilization, Metric Alignment Assessment to measure correspondence with domain-expert features, and Causal Sensitivity Testing to evaluate robustness to protocol perturbations. Using five diverse network datasets spanning controlled and real-world environments, we evaluate four state-of-the-art NFMs, revealing that they all exhibit significant anisotropy, inconsistent feature sensitivity patterns, an inability to separate the high-level context, payload dependency, and other properties. Our work identifies numerous limitations across all models and demonstrates that addressing them can significantly improve model performance (by up to +0.35 $F_1$ score without architectural changes).
