Positional-Unigram Byte Models for Generalized TLS Fingerprinting
Hector A. Valdez, Sean McPherson
TL;DR
The paper tackles TLS fingerprinting robustness against cipher stunting, a tactic where adversaries randomize client hello fields to evade detection. It introduces positional-unigram byte models trained on labeled client hello samples and uses maximum likelihood, via the mean log-likelihood $\\hat{l} = \\arg \max_{l\in L} \frac{1}{K} \sum_{i=1}^{K} \log(p_{i,x_i}^{(l)})$, to identify the most likely client application from the observed client hello, updating models on-the-fly. Evaluation on an internal dataset with 48{,}906 unique samples across 121 classes shows that the approach is robust to cipher stunting and outperforms JA3, which catastrophically degrades under perturbations; JA3 bytes offer only modest gains in some settings. The results suggest this method as a practical complement to JA3 for network defense, enabling more resilient fingerprinting without relying on header information or side-channel data. The work also highlights the potential for extending to higher-order n-grams and integrating server hello data in future analyses.
Abstract
We use positional-unigram byte models along with maximum likelihood for generalized TLS fingerprinting and empirically show that it is robust to cipher stunting. Our approach creates a set of positional-unigram byte models from client hello messages. Each positional-unigram byte model is a statistical model of TLS client hello traffic created by a client application or process. To fingerprint a TLS connection, we use its client hello, and compute the likelihood as a function of a statistical model. The statistical model that maximizes the likelihood function is the predicted client application for the given client hello. Our data driven approach does not use side-channel information and can be updated on-the-fly. We experimentally validate our method on an internal dataset and show that it is robust to cipher stunting by tracking an unbiased $f_{1}$ score as we synthetically increase randomization.
