Hidden Web Caches Discovery

Matteo Golinelli; Bruno Crispo

Hidden Web Caches Discovery

Matteo Golinelli, Bruno Crispo

TL;DR

The paper tackles the problem of detecting cached HTTP responses when cache status headers are missing or unreliable by introducing a timing-analysis method that leverages HTTP/2 multiplexing and cache busting. By comparing timing differences between paired requests organized into Randomized and Fixed groups, and applying a $t$-test with a $p$-value threshold of $0.01$, the approach identifies cached versus origin responses without header signals. Large-scale measurements on the Tranco Top 50k reveal a $5.8\%$ prevalence of hidden caches that do not advertise their status, and subsequent WCD analyses show that $1{,}020$ of these caches are vulnerable to Web Cache Deception, with manual case studies confirming potential data leakage. The method achieves an observed accuracy of $89.6\%$ in preliminary validation and provides a practical, header-agnostic tool for cache detection and security assessment, including open-source availability for widespread use.

Abstract

Web caches play a crucial role in web performance and scalability. However, detecting cached responses is challenging when web servers do not reliably communicate the cache status through standardized headers. This paper presents a novel methodology for cache detection using timing analysis. Our approach eliminates the dependency on cache status headers, making it applicable to any web server. The methodology relies on sending paired requests using HTTP multiplexing functionality and makes heavy use of cache-busting to control the origin of the responses. By measuring the time it takes to receive responses from paired requests, we can determine if a response is cached or not. In each pair, one request is cache-busted to force retrieval from the origin server, while the other request is not and might be served from the cache, if present. A faster response time for the non-cache-busted request compared to the cache-busted one suggests the first one is coming from the cache. We implemented this approach in a tool and achieved an estimated accuracy of 89.6% compared to state-of-the-art methods based on cache status headers. Leveraging our cache detection approach, we conducted a large-scale experiment on the Tranco Top 50k websites. We identified a significant presence of hidden caches (5.8%) that do not advertise themselves through headers. Additionally, we employed our methodology to detect Web Cache Deception (WCD) vulnerabilities in these hidden caches. We discovered that 1.020 of them are susceptible to WCD vulnerabilities, potentially leaking sensitive data. Our findings demonstrate the effectiveness of our timing analysis methodology for cache discovery and highlight the importance of a tool that does not rely on cache-communicated cache status headers.

Hidden Web Caches Discovery

TL;DR

-test with a

-value threshold of

, the approach identifies cached versus origin responses without header signals. Large-scale measurements on the Tranco Top 50k reveal a

prevalence of hidden caches that do not advertise their status, and subsequent WCD analyses show that

of these caches are vulnerable to Web Cache Deception, with manual case studies confirming potential data leakage. The method achieves an observed accuracy of

in preliminary validation and provides a practical, header-agnostic tool for cache detection and security assessment, including open-source availability for widespread use.

Abstract

Paper Structure (31 sections, 1 figure, 4 tables, 1 algorithm)

This paper contains 31 sections, 1 figure, 4 tables, 1 algorithm.

Introduction
Contributions
Background
Web caches and reverse proxies
Content Delivery Networks
Cache Key
Cache status headers
Web Cache Deception
HTTP/2
Timing Attacks
Related Works
Timing Attacks
Web Cache Attacks
Research Goals
Research Questions
...and 16 more sections

Figures (1)

Figure 1: Overview of our cache detection methodology. Note that, for the Fixed group, we perform a request with fixed cache-busters before collecting the time measurements, so that the response should already be stored in the cache. We see that, in the Randomized group, all requests are forwarded to the origin server, and their order of arrival back at the client is inconsistent. For the Fixed group, instead, the response to the request with a fixed cache-buster is directly issued by the web cache, and will therefore consistently arrive at the client first and faster.

Hidden Web Caches Discovery

TL;DR

Abstract

Hidden Web Caches Discovery

Authors

TL;DR

Abstract

Table of Contents

Figures (1)