Positional-Unigram Byte Models for Generalized TLS Fingerprinting

Hector A. Valdez; Sean McPherson

Positional-Unigram Byte Models for Generalized TLS Fingerprinting

Hector A. Valdez, Sean McPherson

TL;DR

The paper tackles TLS fingerprinting robustness against cipher stunting, a tactic where adversaries randomize client hello fields to evade detection. It introduces positional-unigram byte models trained on labeled client hello samples and uses maximum likelihood, via the mean log-likelihood $\\hat{l} = \\arg \max_{l\in L} \frac{1}{K} \sum_{i=1}^{K} \log(p_{i,x_i}^{(l)})$, to identify the most likely client application from the observed client hello, updating models on-the-fly. Evaluation on an internal dataset with 48{,}906 unique samples across 121 classes shows that the approach is robust to cipher stunting and outperforms JA3, which catastrophically degrades under perturbations; JA3 bytes offer only modest gains in some settings. The results suggest this method as a practical complement to JA3 for network defense, enabling more resilient fingerprinting without relying on header information or side-channel data. The work also highlights the potential for extending to higher-order n-grams and integrating server hello data in future analyses.

Abstract

We use positional-unigram byte models along with maximum likelihood for generalized TLS fingerprinting and empirically show that it is robust to cipher stunting. Our approach creates a set of positional-unigram byte models from client hello messages. Each positional-unigram byte model is a statistical model of TLS client hello traffic created by a client application or process. To fingerprint a TLS connection, we use its client hello, and compute the likelihood as a function of a statistical model. The statistical model that maximizes the likelihood function is the predicted client application for the given client hello. Our data driven approach does not use side-channel information and can be updated on-the-fly. We experimentally validate our method on an internal dataset and show that it is robust to cipher stunting by tracking an unbiased $f_{1}$ score as we synthetically increase randomization.

Positional-Unigram Byte Models for Generalized TLS Fingerprinting

TL;DR

, to identify the most likely client application from the observed client hello, updating models on-the-fly. Evaluation on an internal dataset with 48{,}906 unique samples across 121 classes shows that the approach is robust to cipher stunting and outperforms JA3, which catastrophically degrades under perturbations; JA3 bytes offer only modest gains in some settings. The results suggest this method as a practical complement to JA3 for network defense, enabling more resilient fingerprinting without relying on header information or side-channel data. The work also highlights the potential for extending to higher-order n-grams and integrating server hello data in future analyses.

Abstract

score as we synthetically increase randomization.

Paper Structure (39 sections, 14 equations, 8 figures, 3 tables, 2 algorithms)

This paper contains 39 sections, 14 equations, 8 figures, 3 tables, 2 algorithms.

Introduction
Background
TLS Handshake
Client Hello
List of Cipher Suites
List of Compression Methods
List of Extensions
Packet Data
Hexadecimal
Hexadecimal to Decimal
The JA3 Method
discarded information
collisions
lack of robustness
Positional-Unigram Byte Models from Client Hello Messages
...and 24 more sections

Figures (8)

Figure 1: TLS handshake between Client and Server. From Husak2015.
Figure 2: client hello message viewed in Wireshark. Fixed fields are enclosed in solid rectangle and variable fields are enclosed in dotted rectangle. This particular client hello, sent a list of 19 cipher suites to the Server. The Server will subsequently cross reference this list to its own list and select the most secure cipher suite.
Figure 3: client hello message in hexadecimal form.
Figure 4: Top line of client hello packet converted from hexadecimal to decimal.
Figure 5: Visualization of a $J_{Adware}$positional-unigram byte model built from 3,024 Adware client hello messages. Only the first 128 of 448 positions are displayed.
...and 3 more figures

Positional-Unigram Byte Models for Generalized TLS Fingerprinting

TL;DR

Abstract

Positional-Unigram Byte Models for Generalized TLS Fingerprinting

Authors

TL;DR

Abstract

Table of Contents

Figures (8)