Table of Contents
Fetching ...

Fingerprinting web servers through Transformer-encoded HTTP response headers

Patrick Darwinkel

TL;DR

This work presents a Transformer-based approach to web server fingerprinting by encoding HTTP response status lines and headers into dense embeddings. It builds a large, labeled dataset from 4.77 million domains and uses a RoBERTa-based encoder with PCA to produce 2048-dimensional features per domain, feeding them to a feed-forward network and a Random Forest for server-type and server-version classification. The results show near-saturation performance for major server types (macro F1 ≈ 0.94–0.96) and a meaningful improvement in minor-version discrimination (weighted F1 ≈ 0.55), indicating that status lines carry strong, exploitable fingerprints. The study also analyzes test-case importance and outlines limitations and extensive future work, including complete header usage, universality across ports/protocols, and deeper methodological analyses. Overall, the approach demonstrates that NLP-style representations can outperform traditional rule-based fingerprinting for web servers, with potential practical impact on vulnerability assessment and incident response, while highlighting areas to address for robust real-world deployment.

Abstract

We explored leveraging state-of-the-art deep learning, big data, and natural language processing to enhance the detection of vulnerable web server versions. Focusing on improving accuracy and specificity over rule-based systems, we conducted experiments by sending various ambiguous and non-standard HTTP requests to 4.77 million domains and capturing HTTP response status lines. We represented these status lines through training a BPE tokenizer and RoBERTa encoder for unsupervised masked language modeling. We then dimensionality reduced and concatenated encoded response lines to represent each domain's web server. A Random Forest and multilayer perceptron (MLP) classified these web servers, and achieved 0.94 and 0.96 macro F1-score, respectively, on detecting the five most popular origin web servers. The MLP achieved a weighted F1-score of 0.55 on classifying 347 major type and minor version pairs. Analysis indicates that our test cases are meaningful discriminants of web server types. Our approach demonstrates promise as a powerful and flexible alternative to rule-based systems.

Fingerprinting web servers through Transformer-encoded HTTP response headers

TL;DR

This work presents a Transformer-based approach to web server fingerprinting by encoding HTTP response status lines and headers into dense embeddings. It builds a large, labeled dataset from 4.77 million domains and uses a RoBERTa-based encoder with PCA to produce 2048-dimensional features per domain, feeding them to a feed-forward network and a Random Forest for server-type and server-version classification. The results show near-saturation performance for major server types (macro F1 ≈ 0.94–0.96) and a meaningful improvement in minor-version discrimination (weighted F1 ≈ 0.55), indicating that status lines carry strong, exploitable fingerprints. The study also analyzes test-case importance and outlines limitations and extensive future work, including complete header usage, universality across ports/protocols, and deeper methodological analyses. Overall, the approach demonstrates that NLP-style representations can outperform traditional rule-based fingerprinting for web servers, with potential practical impact on vulnerability assessment and incident response, while highlighting areas to address for robust real-world deployment.

Abstract

We explored leveraging state-of-the-art deep learning, big data, and natural language processing to enhance the detection of vulnerable web server versions. Focusing on improving accuracy and specificity over rule-based systems, we conducted experiments by sending various ambiguous and non-standard HTTP requests to 4.77 million domains and capturing HTTP response status lines. We represented these status lines through training a BPE tokenizer and RoBERTa encoder for unsupervised masked language modeling. We then dimensionality reduced and concatenated encoded response lines to represent each domain's web server. A Random Forest and multilayer perceptron (MLP) classified these web servers, and achieved 0.94 and 0.96 macro F1-score, respectively, on detecting the five most popular origin web servers. The MLP achieved a weighted F1-score of 0.55 on classifying 347 major type and minor version pairs. Analysis indicates that our test cases are meaningful discriminants of web server types. Our approach demonstrates promise as a powerful and flexible alternative to rule-based systems.
Paper Structure (92 sections, 11 figures, 4 tables)

This paper contains 92 sections, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Example of a raw, plain-text HTTP HEAD request.
  • Figure 2: Example of the raw, plain-text HTTP response to the HEAD request from Figure \ref{['fig:requestexample']}. The original request was sent over a port 80 TCP socket connection.
  • Figure 3: 3-dimensional T-distributed Stochastic Neighbor Embedding 7b54165e73a3424b8820136bcf61ca89 of 10.000 random samples, colored by major server type.
  • Figure 4: An example of a HTTP response header by an Apache web server. Courtesy of the Open Web Application Security Project.
  • Figure 5: An example of a HTTP response header by an nginx web server. Courtesy of the Open Web Application Security Project.
  • ...and 6 more figures