Table of Contents
Fetching ...

LLM-Assisted Web Measurements

Simone Bozzolan, Stefano Calzavara, Lorenzo Cazzaro

TL;DR

This work addresses the lack of labeled data in web measurement datasets by leveraging Large Language Models (LLMs) to automatically classify websites for targeted studies. It builds a curated benchmark across three tasks—governmental website detection, country attribution, and website category—and evaluates three LLMs (Gemini 2.5 Flash, Llama 4, Gemma 3) with URL-only and URL+screenshots prompts. The results show strong classification performance, with Gemini often outperforming self-hosted models, and demonstrate LLM-assisted web measurements that reproduce key privacy and security findings from prior work, such as SSO privacy risks and governmental tracking patterns. The study provides datasets, prompts, and a reproducible workflow that enables scalable, semantics-aware web measurement research with practical implications for privacy, governance, and measurement methodology.

Abstract

Web measurements are a well-established methodology for assessing the security and privacy landscape of the Internet. However, existing top lists of popular websites commonly used as measurement targets are unlabeled and lack semantic information about the nature of the sites they include. This limitation makes targeted measurements challenging, as researchers often need to rely on ad-hoc techniques to bias their datasets toward specific categories of interest. In this paper, we investigate the use of Large Language Models (LLMs) as a means to enable targeted web measurement studies through their semantic understanding capabilities. Building on prior literature, we identify key website classification tasks relevant to web measurements and construct datasets to systematically evaluate the performance of different LLMs on these tasks. Our results demonstrate that LLMs may achieve strong performance across multiple classification scenarios. We then conduct LLM-assisted web measurement studies inspired by prior work and rigorously assess the validity of the resulting research inferences. Our results demonstrate that LLMs can serve as a practical tool for analyzing security and privacy trends on the Web.

LLM-Assisted Web Measurements

TL;DR

This work addresses the lack of labeled data in web measurement datasets by leveraging Large Language Models (LLMs) to automatically classify websites for targeted studies. It builds a curated benchmark across three tasks—governmental website detection, country attribution, and website category—and evaluates three LLMs (Gemini 2.5 Flash, Llama 4, Gemma 3) with URL-only and URL+screenshots prompts. The results show strong classification performance, with Gemini often outperforming self-hosted models, and demonstrate LLM-assisted web measurements that reproduce key privacy and security findings from prior work, such as SSO privacy risks and governmental tracking patterns. The study provides datasets, prompts, and a reproducible workflow that enables scalable, semantics-aware web measurement research with practical implications for privacy, governance, and measurement methodology.

Abstract

Web measurements are a well-established methodology for assessing the security and privacy landscape of the Internet. However, existing top lists of popular websites commonly used as measurement targets are unlabeled and lack semantic information about the nature of the sites they include. This limitation makes targeted measurements challenging, as researchers often need to rely on ad-hoc techniques to bias their datasets toward specific categories of interest. In this paper, we investigate the use of Large Language Models (LLMs) as a means to enable targeted web measurement studies through their semantic understanding capabilities. Building on prior literature, we identify key website classification tasks relevant to web measurements and construct datasets to systematically evaluate the performance of different LLMs on these tasks. Our results demonstrate that LLMs may achieve strong performance across multiple classification scenarios. We then conduct LLM-assisted web measurement studies inspired by prior work and rigorously assess the validity of the resulting research inferences. Our results demonstrate that LLMs can serve as a practical tool for analyzing security and privacy trends on the Web.

Paper Structure

This paper contains 33 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Percentage of websites with minimal scope by category, comparing SiteAdvisor and Gemini classification.
  • Figure 2: Percentage of websites with third-party trackers by country in the three datasets.
  • Figure 3: Percentage of websites with minimal scope by category for the two most popular IDPs.