LLMcap: Large Language Model for Unsupervised PCAP Failure Detection

Lukasz Tulczyjew; Kinan Jarrah; Charles Abondo; Dina Bennett; Nathanael Weill

LLMcap: Large Language Model for Unsupervised PCAP Failure Detection

Lukasz Tulczyjew, Kinan Jarrah, Charles Abondo, Dina Bennett, Nathanael Weill

TL;DR

This paper tackles scalable PCAP failure detection in telecom networks under limited labeled data. It introduces LLMcap, a self-supervised, masked language modeling pipeline that learns PCAP grammar from unannotated data and supports failure localization through chunk-level predictions aggregated to PCAPs. Key contributions include a dual PCAP representation approach (text vs key-value dictionaries), a Failure Detection Algorithm combining thresholding and an unsupervised Elliptic Envelope, and extensive analysis of generalization to external services and execution time. The approach enables efficient, edge-friendly failure detection and offers interpretability by tagging high-signal chunks for root-cause analysis, with future work expanding service coverage and real-time deployment capabilities.

Abstract

The integration of advanced technologies into telecommunication networks complicates troubleshooting, posing challenges for manual error identification in Packet Capture (PCAP) data. This manual approach, requiring substantial resources, becomes impractical at larger scales. Machine learning (ML) methods offer alternatives, but the scarcity of labeled data limits accuracy. In this study, we propose a self-supervised, large language model-based (LLMcap) method for PCAP failure detection. LLMcap leverages language-learning abilities and employs masked language modeling to learn grammar, context, and structure. Tested rigorously on various PCAPs, it demonstrates high accuracy despite the absence of labeled data during training, presenting a promising solution for efficient network analysis. Index Terms: Network troubleshooting, Packet Capture Analysis, Self-Supervised Learning, Large Language Model, Network Quality of Service, Network Performance.

LLMcap: Large Language Model for Unsupervised PCAP Failure Detection

TL;DR

Abstract

Paper Structure (22 sections, 2 equations, 4 figures, 5 tables)

This paper contains 22 sections, 2 equations, 4 figures, 5 tables.

Introduction
Related Work
Problem Formulation
Proposed System/Model
Data Collection
PCAP Parser and Preprocessor
Parsing
Data Sanitization
Chunking
Masking
Large Language Model Training
Model Selection: DistilBERT and Knowledge Distillation
Loss Function and Training Configuration
Input Token Masking and Shuffling
Experimental Setup
...and 7 more sections

Figures (4)

Figure 1: The overall strategy for training and inference using LLMcap.
Figure 2: Example of formatted original and masked input to MLM training of LLM model.
Figure 3: The visual representation of two measures, i.e., the number of misclassifications NOM-K and mean NLL-K loss, for ground-truth labels (top), and predictions of our method (bottom).
Figure 4: Listing shows an example of output from LLMcap. In this case, the model found 2 sections of SIP-related errors.

LLMcap: Large Language Model for Unsupervised PCAP Failure Detection

TL;DR

Abstract

LLMcap: Large Language Model for Unsupervised PCAP Failure Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (4)