Table of Contents
Fetching ...

Demystifying Network Foundation Models

Sylee Beltiukov, Satyandra Guthula, Wenbo Guo, Walter Willinger, Arpit Gupta

TL;DR

This work tackles the challenge of understanding what Network Foundation Models (NFMs) actually learn in their latent representations, rather than just how well they perform on downstream tasks. It introduces an intrinsic evaluation framework comprising Embedding Geometry Analysis, Metric Alignment Assessment, and Causal Sensitivity Testing to probe representation quality in frozen NFMs. Across four state-of-the-art NFMs and five network datasets, the study reveals pervasive anisotropy, architecture-dependent alignment with domain metrics, and sensitivity to payload and context, highlighting limitations that are invisible to task-based evaluation. By showing that addressing representation issues can yield meaningful performance gains without architectural changes, the paper provides a principled approach to designing more generalizable and robust NFMs for self-driving network applications.

Abstract

This work presents a systematic investigation into the latent knowledge encoded within Network Foundation Models (NFMs) that focuses on hidden representations analysis rather than pure downstream task performance. Different from existing efforts, we analyze the models through a three-part evaluation: Embedding Geometry Analysis to assess representation space utilization, Metric Alignment Assessment to measure correspondence with domain-expert features, and Causal Sensitivity Testing to evaluate robustness to protocol perturbations. Using five diverse network datasets spanning controlled and real-world environments, we evaluate four state-of-the-art NFMs, revealing that they all exhibit significant anisotropy, inconsistent feature sensitivity patterns, an inability to separate the high-level context, payload dependency, and other properties. Our work identifies numerous limitations across all models and demonstrates that addressing them can significantly improve model performance (by up to +0.35 $F_1$ score without architectural changes).

Demystifying Network Foundation Models

TL;DR

This work tackles the challenge of understanding what Network Foundation Models (NFMs) actually learn in their latent representations, rather than just how well they perform on downstream tasks. It introduces an intrinsic evaluation framework comprising Embedding Geometry Analysis, Metric Alignment Assessment, and Causal Sensitivity Testing to probe representation quality in frozen NFMs. Across four state-of-the-art NFMs and five network datasets, the study reveals pervasive anisotropy, architecture-dependent alignment with domain metrics, and sensitivity to payload and context, highlighting limitations that are invisible to task-based evaluation. By showing that addressing representation issues can yield meaningful performance gains without architectural changes, the paper provides a principled approach to designing more generalizable and robust NFMs for self-driving network applications.

Abstract

This work presents a systematic investigation into the latent knowledge encoded within Network Foundation Models (NFMs) that focuses on hidden representations analysis rather than pure downstream task performance. Different from existing efforts, we analyze the models through a three-part evaluation: Embedding Geometry Analysis to assess representation space utilization, Metric Alignment Assessment to measure correspondence with domain-expert features, and Causal Sensitivity Testing to evaluate robustness to protocol perturbations. Using five diverse network datasets spanning controlled and real-world environments, we evaluate four state-of-the-art NFMs, revealing that they all exhibit significant anisotropy, inconsistent feature sensitivity patterns, an inability to separate the high-level context, payload dependency, and other properties. Our work identifies numerous limitations across all models and demonstrates that addressing them can significantly improve model performance (by up to +0.35 score without architectural changes).

Paper Structure

This paper contains 27 sections, 3 figures, 12 tables.

Figures (3)

  • Figure 1: Visual overview of the proposed framework. Color indicates different stages of the analysis and dashed lines and boxes denote the perturbed network traffic path.
  • Figure 2: CDF of CKA similarity among different model embeddings and CICFlowMeter features averaged across all five datasets.
  • Figure 3: Similarity index of each of CICFlowMeter features per model. YaTC and netFound demonstrate higher similarity with well-known white-box features compared to other models.