On optimizing Inband Telemetry systems for accurate latency-based service deployments
Nataliia Koneva, Alfonso Sánchez-Macián, José Alberto Hernández, Óscar González de Dios
TL;DR
The paper tackles the challenge of obtaining accurate end-to-end latency estimates for latency-critical services in SDN/ZSM by proposing a telemetry sizing approach based on Cochran's formula. It introduces a practical methodology to determine the number of latency measurements $n_{0}$ required to bound the error for a given threshold $D_{th}$, and demonstrates this with a two-path Milano-net example that models per-link delays as $M/M/1$ queues and includes propagation delays of $5~\mu s/\mathrm{km}$. Through a full topology simulation (595 paths), the study shows that higher telemetry density markedly reduces false positives and negatives in path selection relative to the 82 $\mu s$ threshold, with concrete FP/FN counts falling from 73/3 at 5p to 1/1 at 2500p. The findings underscore the importance of calibrated telemetry sampling as a prerequisite for reliable zero-touch networking decisions and outline concrete steps for future P4/SONiC implementation and open-source replication.
Abstract
The power of Machine Learning and Artificial Intelligence algorithms based on collected datasets, along with the programmability and flexibility provided by Software Defined Networking can provide the building blocks for constructing the so-called Zero-Touch Network and Service Management systems. However, the fuel towards this goal relies on the availability of sufficient and good-quality data collected from measurements and telemetry. This article provides a telemetry methodology to collect accurate latency measurements, as a first step toward building intelligent control planes that make correct decisions based on precise information.
