Table of Contents
Fetching ...

PLUME: Building a Network-Native Foundation Model for Wireless Traces via Protocol-Aware Tokenization

Swadhin Pradhan, Shazal Irshad, Jerome Henry

Abstract

Foundation models succeed when they learn in the native structure of a modality, whether morphology-respecting tokens in language or pixels in vision. Wireless packet traces deserve the same treatment: meaning emerges from layered headers, typed fields, timing gaps, and cross-packet state machines, not flat strings. We present Plume (Protocol Language Understanding Model for Exchanges), a compact 140M-parameter foundation model for 802.11 traces that learns from structured PDML dissections. A protocol-aware tokenizer splits along the dissector field tree, emits gap tokens for timing, and normalizes identifiers, yielding 6.2x shorter sequences than BPE with higher per token information density. Trained on a curated corpus, Plume achieves 74-97% next-packet token accuracy across five real-world failure categories and AUROC >= 0.99 for zero-shot anomaly detection. On the same prediction task, frontier LLMs (Claude Opus 4.6, GPT-5.4) score comparably despite receiving identical protocol context, yet Plume does so with > 600x fewer parameters, fitting on a single GPU at effectively zero marginal cost vs. cloud API pricing, enabling on-prem, privacy-preserving root cause analysis.

PLUME: Building a Network-Native Foundation Model for Wireless Traces via Protocol-Aware Tokenization

Abstract

Foundation models succeed when they learn in the native structure of a modality, whether morphology-respecting tokens in language or pixels in vision. Wireless packet traces deserve the same treatment: meaning emerges from layered headers, typed fields, timing gaps, and cross-packet state machines, not flat strings. We present Plume (Protocol Language Understanding Model for Exchanges), a compact 140M-parameter foundation model for 802.11 traces that learns from structured PDML dissections. A protocol-aware tokenizer splits along the dissector field tree, emits gap tokens for timing, and normalizes identifiers, yielding 6.2x shorter sequences than BPE with higher per token information density. Trained on a curated corpus, Plume achieves 74-97% next-packet token accuracy across five real-world failure categories and AUROC >= 0.99 for zero-shot anomaly detection. On the same prediction task, frontier LLMs (Claude Opus 4.6, GPT-5.4) score comparably despite receiving identical protocol context, yet Plume does so with > 600x fewer parameters, fitting on a single GPU at effectively zero marginal cost vs. cloud API pricing, enabling on-prem, privacy-preserving root cause analysis.
Paper Structure (29 sections, 13 figures, 12 tables)

This paper contains 29 sections, 13 figures, 12 tables.

Figures (13)

  • Figure 1: HDBSCAN-based curation. (a) Long-tail cluster sizes; beacon-dominated clusters contain thousands of near-identical frames. (b) Frames group by protocol function, validating clustering-based deduplication.
  • Figure 2: Average tokens per packet. Plume's protocol-aware tokenizer yields $6.2\times$ shorter sequences than BPE and $16.2\times$ shorter than byte-level.
  • Figure 3: Token accuracy by category for Small (140M), Medium (225M), and Large (450M). Same vocabulary and depth; differences reflect model width.
  • Figure 4: Per-field accuracy by category (left) and 10 best/worst fields (right). Addresses and frame control are near-perfect; timing and rare fields are hardest.
  • Figure 5: Prediction accuracy vs. context length. Accuracy saturates by 2--3 packets, matching typical 802.11 exchange length.
  • ...and 8 more figures