Direct Feature Access -- Scaling Network Traffic Feature Collection to Terabit Speed
Lukas Froschauer, Jonatan Langlet, Andreas Kassler
TL;DR
The paper tackles the challenge of real-time, fine-grained network telemetry at terabit speeds for ML-based analysis. It introduces Direct Feature Access (DFA), a system that performs feature extraction directly in P4-programmable switches and streams feature vectors to GPU memory via GPUDirect RDMA, bypassing CPU-bound control planes. DFA achieves over 31 million feature vectors per second and supports 524,000 flows within 20 ms on a single port, demonstrated on Intel Tofino and NVIDIA A100 hardware. This approach eliminates key control-plane bottlenecks, enabling scalable, low-latency, ML-driven traffic analytics at terabit scales, with significant practical impact for real-time network monitoring and security.
Abstract
Real-time traffic monitoring is critical for network operators to ensure performance, security, and visibility, especially as encryption becomes the norm. AI and ML have emerged as powerful tools to create deeper insights from network traffic, but collecting the fine-grained features needed at terabit speeds remains a major bottleneck. We introduce Direct Feature Access (DFA): a high-speed telemetry system that extracts flow features at line rate using P4-programmable data planes, and delivers them directly to GPUs via RDMA and GPUDirect, completely bypassing the ML server's CPU. DFA enables feature enrichment and immediate inference on GPUs, eliminating traditional control plane bottlenecks and dramatically reducing latency. We implement DFA on Intel Tofino switches and NVIDIA A100 GPUs, achieving extraction and delivery of over 31 million feature vectors per second, supporting 524,000 flows within sub-20 ms monitoring periods, on a single port. DFA unlocks scalable, real-time, ML-driven traffic analysis at terabit speeds, pushing the frontier of what is possible for next-generation network monitoring.
