Seqnature: Extracting Network Fingerprints from Packet Sequences

Janus Varmarken; Rahmadi Trimananda; Athina Markopoulou

Seqnature: Extracting Network Fingerprints from Packet Sequences

Janus Varmarken, Rahmadi Trimananda, Athina Markopoulou

TL;DR

Seqnature tackles the challenge of identifying applications and events from network traffic by unifying fingerprint extraction into a single framework that operates on packet-sequence data. It introduces a two-phase workflow—preprocessing and fingerprint refinement—that converts traffic into feature-rich TCP streams, then uses clustering to produce seqnatures representing consistently occurring packet sequences. The paper demonstrates five fingerprinting techniques drawn from data-exchange and endpoint-based perspectives, and applies them to two public datasets (FingerprinTV and PingPong) to compare prevalence and distinctiveness, including a thorough false-positive analysis. The results corroborate prior findings that endpoint information alone is often insufficient for distinguishing events on IoT devices, while also showing nuances in smart-TV app fingerprints when relying solely on endpoint data. Overall, Seqnature provides a flexible, extensible platform for evaluating and designing fingerprinting methods, with potential implications for privacy, security, and resilient network protocol design.

Abstract

This paper proposes a general network fingerprinting framework, Seqnature, that uses packet sequences as its basic data unit and that makes it simple to implement any fingerprinting technique that can be formulated as a problem of identifying packet exchanges that consistently occur when the fingerprinted event is triggered. We demonstrate the versatility of Seqnature by using it to implement five different fingerprinting techniques, as special cases of the framework, which broadly fall into two categories: (i) fingerprinting techniques that consider features of each individual packet in a packet sequence, e.g., size and direction; and (ii) fingerprinting techniques that only consider stream-wide features, specifically what Internet endpoints are contacted. We illustrate how Seqnature facilitates comparisons of the relative performance of different fingerprinting techniques by applying the five fingerprinting techniques to datasets from the literature. The results confirm findings in prior work, for example that endpoint information alone is insufficient to differentiate between individual events on Internet of Things devices, but also show that smart TV app fingerprints based exclusively on endpoint information are not as distinct as previously reported.

Seqnature: Extracting Network Fingerprints from Packet Sequences

TL;DR

Abstract

Paper Structure (20 sections, 6 figures, 2 tables)

This paper contains 20 sections, 6 figures, 2 tables.

Introduction
Related Work
Fingerprinting Framework
Preprocessing
Fingerprint Refinement
Representations of the Seqnature
Fingerprint Matching
Fingerprinting Techniques
Fingerprints Based on Data Exchanges
Size and Direction
Endpoint, Size, and Direction
Fingerprints Based on Endpoints
Datasets
FingerprinTV: Smart TV Apps
PingPong: Events on IoT Devices
...and 5 more sections

Figures (6)

Figure 1: Overview of Seqnature. To fingerprint event $e$, Seqnature is provided with $T$ samples of the network traffic that occurred immediately after $e$ was triggered. Seqnature has two phases: a preprocessing phase (Section \ref{['sec:seqnature-fingerprinting-technique-preprocessing']}) that extracts TCP stream information from the raw traffic samples, and an iterative fingerprint refinement phase (Section \ref{['sec:seqnature-fingerprinting-technique-refinement']}) that identifies packet sequences that co-occur with $e$.
Figure 2: Example of how packet sequences of length $n=4$ are formed from the first $P{}=20$ packets of a TCP stream.
Figure 3: Example clustering of packet sequences of length $n=4$ extracted from $T{}=3$ different traffic samples. The color of packets in a packet sequence denotes what traffic sample the packet sequence stems from. For example, all packet sequences with orange packets stem from the same traffic sample. The number of packet sequences in a cluster may vary across clusters (and can be greater than $T{}$), but all packet sequences across all clusters will always be of the same length $n$, as $n$ only changes between each fingerprint refinement iteration.
Figure 4: Example clusterings for two successive fingerprint refinement iterations ($n=4$ and $n=3$). When $n$ decreases, at least one cluster containing shorter versions of packet sequences that have already been included in the seqnature will be formed: in this example, clusters $c_2$ and $c_3$ both (exclusively) consist of shorter versions of the packet sequences in cluster $c_1$.
Figure 5: Example of a seqnature that comprises two clusters, represented in complete form and in summary form. Subscript notation: first value is the cluster number, second value is the packet sequence number (within the cluster), and third value is the packet index (within the packet sequence).
...and 1 more figures

Theorems & Definitions (6)

Definition 1.1
Definition 4.1
Definition 4.2
Definition 4.3
Definition 4.4
Definition 4.5

Seqnature: Extracting Network Fingerprints from Packet Sequences

TL;DR

Abstract

Seqnature: Extracting Network Fingerprints from Packet Sequences

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (6)