Harmful Terms and Where to Find Them: Measuring and Modeling Unfavorable Financial Terms and Conditions in Shopping Websites at Scale

Elisa Tsai; Neal Mangaokar; Boyuan Zheng; Haizhong Zheng; Atul Prakash

Harmful Terms and Where to Find Them: Measuring and Modeling Unfavorable Financial Terms and Conditions in Shopping Websites at Scale

Elisa Tsai, Neal Mangaokar, Boyuan Zheng, Haizhong Zheng, Atul Prakash

TL;DR

This work addresses the problem of unfavorable financial terms in online shopping T&Cs by introducing a scalable pipeline (TermMiner) to collect and cluster T&Cs, a large English-term dataset (ShopTC-100K) with a four-category, 22-type taxonomy of unfavorable financial terms, and a detector (TermLens) built on GPT-4o that achieves $TPR = 96.6\%$ and $F1 = 82.5\%$ in zero-shot evaluation and $F1 = 94.6\%$ after fine-tuning. Large-scale deployment finds that $42.06\%$ of Tranco top sites contain at least one unfavorable financial term, with higher prevalence on less popular sites, and post-purchase terms being most common. The results demonstrate the feasibility of automated, large-scale detection of harmful T&Cs and highlight gaps in current defenses, underscoring the need for stronger consumer protections and ongoing monitoring. The authors release open-source tooling to enable longitudinal studies and broader analyses of terms and conditions in e-commerce.

Abstract

Terms and conditions for online shopping websites often contain terms that can have significant financial consequences for customers. Despite their impact, there is currently no comprehensive understanding of the types and potential risks associated with unfavorable financial terms. Furthermore, there are no publicly available detection systems or datasets to systematically identify or mitigate these terms. In this paper, we take the first steps toward solving this problem with three key contributions. \textit{First}, we introduce \textit{TermMiner}, an automated data collection and topic modeling pipeline to understand the landscape of unfavorable financial terms. \textit{Second}, we create \textit{ShopTC-100K}, a dataset of terms and conditions from shopping websites in the Tranco top 100K list, comprising 1.8 million terms from 8,251 websites. Consequently, we develop a taxonomy of 22 types from 4 categories of unfavorable financial terms -- spanning purchase, post-purchase, account termination, and legal aspects. \textit{Third}, we build \textit{TermLens}, an automated detector that uses Large Language Models (LLMs) to identify unfavorable financial terms. Fine-tuned on an annotated dataset, \textit{TermLens} achieves an F1 score of 94.6\% and a false positive rate of 2.3\% using GPT-4o. When applied to shopping websites from the Tranco top 100K, we find that 42.06\% of these sites contain at least one unfavorable financial term, with such terms being more prevalent on less popular websites. Case studies further highlight the financial risks and customer dissatisfaction associated with unfavorable financial terms, as well as the limitations of existing ecosystem defenses.

Harmful Terms and Where to Find Them: Measuring and Modeling Unfavorable Financial Terms and Conditions in Shopping Websites at Scale

TL;DR

Abstract

Harmful Terms and Where to Find Them: Measuring and Modeling Unfavorable Financial Terms and Conditions in Shopping Websites at Scale

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)