Table of Contents
Fetching ...

Time-Series Foundation Models for ISP Traffic Forecasting

Fan Liu, Behrooz Farkiani, Patrick Crowley

TL;DR

The paper tackles the challenge of scalable ISP traffic forecasting by evaluating a pretrained time-series foundation model, IBM's Tiny Time Mixer (TTM), on the CESNET-TimeSeries24 dataset. It compares zero-shot and few-shot approaches, across multiple resolutions and hierarchical aggregation levels, and demonstrates that TTM achieves consistent predictive accuracy with $RMSE$ in the $0.026$–$0.057$ range and stable $R^2$ across horizons, while delivering CPU-only inference under $0.05$ s per $100$ points. The results show robust cross-level generalization, modest gains from limited fine-tuning, and negligible benefit from exogenous features, highlighting the practicality of training-free deployment in real ISP environments. Overall, the work supports a train-once, deploy-everywhere paradigm for scalable network monitoring, enabling efficient, training-free forecasting without specialized hardware.

Abstract

Accurate network-traffic forecasting enables proactive capacity planning and anomaly detection in Internet Service Provider (ISP) networks. Recent advances in time-series foundation models (TSFMs) have demonstrated strong zero-shot and few-shot generalization across diverse domains, yet their effectiveness for computer networking remains unexplored. This paper presents a systematic evaluation of a TSFM, IBM's Tiny Time Mixer (TTM), on the CESNET-TimeSeries24 dataset, a 40-week real-world ISP telemetry corpus. We assess TTM under zero-shot and few-shot settings across multiple forecasting horizons (hours to days), aggregation hierarchies (institutions, subnets, IPs), and temporal resolutions (10-minute and hourly). Results show that TTM achieves consistent accuracy (RMSE 0.026-0.057) and stable $R^2$ scores across horizons and context lengths, outperforming or matching fully trained deep learning baselines such as GRU and LSTM. Inference latency remains under 0.05s per 100 points on a single MacBook Pro using CPU-only computation, confirming deployability without dedicated GPU or MPS acceleration. These findings highlight the potential of pretrained TSFMs to enable scalable, efficient, and training-free forecasting for modern network monitoring and management systems.

Time-Series Foundation Models for ISP Traffic Forecasting

TL;DR

The paper tackles the challenge of scalable ISP traffic forecasting by evaluating a pretrained time-series foundation model, IBM's Tiny Time Mixer (TTM), on the CESNET-TimeSeries24 dataset. It compares zero-shot and few-shot approaches, across multiple resolutions and hierarchical aggregation levels, and demonstrates that TTM achieves consistent predictive accuracy with in the range and stable across horizons, while delivering CPU-only inference under s per points. The results show robust cross-level generalization, modest gains from limited fine-tuning, and negligible benefit from exogenous features, highlighting the practicality of training-free deployment in real ISP environments. Overall, the work supports a train-once, deploy-everywhere paradigm for scalable network monitoring, enabling efficient, training-free forecasting without specialized hardware.

Abstract

Accurate network-traffic forecasting enables proactive capacity planning and anomaly detection in Internet Service Provider (ISP) networks. Recent advances in time-series foundation models (TSFMs) have demonstrated strong zero-shot and few-shot generalization across diverse domains, yet their effectiveness for computer networking remains unexplored. This paper presents a systematic evaluation of a TSFM, IBM's Tiny Time Mixer (TTM), on the CESNET-TimeSeries24 dataset, a 40-week real-world ISP telemetry corpus. We assess TTM under zero-shot and few-shot settings across multiple forecasting horizons (hours to days), aggregation hierarchies (institutions, subnets, IPs), and temporal resolutions (10-minute and hourly). Results show that TTM achieves consistent accuracy (RMSE 0.026-0.057) and stable scores across horizons and context lengths, outperforming or matching fully trained deep learning baselines such as GRU and LSTM. Inference latency remains under 0.05s per 100 points on a single MacBook Pro using CPU-only computation, confirming deployability without dedicated GPU or MPS acceleration. These findings highlight the potential of pretrained TSFMs to enable scalable, efficient, and training-free forecasting for modern network monitoring and management systems.

Paper Structure

This paper contains 33 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Overview of the proposed experimental methodology. The CESNET-TimeSeries24 dataset is preprocessed (scaling, windowing, and split into train/val/test). The TTM is used for both zero-shot inference and few-shot fine-tuning. We evaluate multiple context–horizon configurations, cross-level generalization, and the effect of exogenous features, reporting RMSE, MAE, MSE, and $R^2$ metrics.
  • Figure 2: Hourly: Context sensitivity (Institutions, $H=96$).
  • Figure 3: 10-minute: Horizon scaling ($L=1024$).
  • Figure 4: Hourly: Cross-level generalization (Institutions $\rightarrow$ Subnets/IPs, $L=1024$, $H=96$).
  • Figure 5: Hourly: Few-shot learning curve (Institutions, $L=1024$, $H=96$). Mean and median RMSE/$R^2$ shown across 10%, 30%, and 50% fine-tuning fractions.
  • ...and 2 more figures