Fingerprinting web servers through Transformer-encoded HTTP response headers

Patrick Darwinkel

Fingerprinting web servers through Transformer-encoded HTTP response headers

Patrick Darwinkel

TL;DR

This work presents a Transformer-based approach to web server fingerprinting by encoding HTTP response status lines and headers into dense embeddings. It builds a large, labeled dataset from 4.77 million domains and uses a RoBERTa-based encoder with PCA to produce 2048-dimensional features per domain, feeding them to a feed-forward network and a Random Forest for server-type and server-version classification. The results show near-saturation performance for major server types (macro F1 ≈ 0.94–0.96) and a meaningful improvement in minor-version discrimination (weighted F1 ≈ 0.55), indicating that status lines carry strong, exploitable fingerprints. The study also analyzes test-case importance and outlines limitations and extensive future work, including complete header usage, universality across ports/protocols, and deeper methodological analyses. Overall, the approach demonstrates that NLP-style representations can outperform traditional rule-based fingerprinting for web servers, with potential practical impact on vulnerability assessment and incident response, while highlighting areas to address for robust real-world deployment.

Abstract

We explored leveraging state-of-the-art deep learning, big data, and natural language processing to enhance the detection of vulnerable web server versions. Focusing on improving accuracy and specificity over rule-based systems, we conducted experiments by sending various ambiguous and non-standard HTTP requests to 4.77 million domains and capturing HTTP response status lines. We represented these status lines through training a BPE tokenizer and RoBERTa encoder for unsupervised masked language modeling. We then dimensionality reduced and concatenated encoded response lines to represent each domain's web server. A Random Forest and multilayer perceptron (MLP) classified these web servers, and achieved 0.94 and 0.96 macro F1-score, respectively, on detecting the five most popular origin web servers. The MLP achieved a weighted F1-score of 0.55 on classifying 347 major type and minor version pairs. Analysis indicates that our test cases are meaningful discriminants of web server types. Our approach demonstrates promise as a powerful and flexible alternative to rule-based systems.

Fingerprinting web servers through Transformer-encoded HTTP response headers

TL;DR

Abstract

Paper Structure (92 sections, 11 figures, 4 tables)

This paper contains 92 sections, 11 figures, 4 tables.

Introduction
Research questions
Background and related work
Theory of web server fingerprinting
Changes in the distribution of the World Wide Web
Detection of malicious HTTP client requests
Detection of malicious web servers
Strategies on document representation
Data and Material
Test cases
Testing HTTP/2 support
Collector algorithm
Overall architecture
Resolving and locating websites
Running the tests
...and 77 more sections

Figures (11)

Figure 1: Example of a raw, plain-text HTTP HEAD request.
Figure 2: Example of the raw, plain-text HTTP response to the HEAD request from Figure \ref{['fig:requestexample']}. The original request was sent over a port 80 TCP socket connection.
Figure 3: 3-dimensional T-distributed Stochastic Neighbor Embedding 7b54165e73a3424b8820136bcf61ca89 of 10.000 random samples, colored by major server type.
Figure 4: An example of a HTTP response header by an Apache web server. Courtesy of the Open Web Application Security Project.
Figure 5: An example of a HTTP response header by an nginx web server. Courtesy of the Open Web Application Security Project.
...and 6 more figures

Fingerprinting web servers through Transformer-encoded HTTP response headers

TL;DR

Abstract

Fingerprinting web servers through Transformer-encoded HTTP response headers

Authors

TL;DR

Abstract

Table of Contents

Figures (11)