Table of Contents
Fetching ...

Harmful Terms and Where to Find Them: Measuring and Modeling Unfavorable Financial Terms and Conditions in Shopping Websites at Scale

Elisa Tsai, Neal Mangaokar, Boyuan Zheng, Haizhong Zheng, Atul Prakash

TL;DR

This work addresses the problem of unfavorable financial terms in online shopping T&Cs by introducing a scalable pipeline (TermMiner) to collect and cluster T&Cs, a large English-term dataset (ShopTC-100K) with a four-category, 22-type taxonomy of unfavorable financial terms, and a detector (TermLens) built on GPT-4o that achieves $TPR = 96.6\%$ and $F1 = 82.5\%$ in zero-shot evaluation and $F1 = 94.6\%$ after fine-tuning. Large-scale deployment finds that $42.06\%$ of Tranco top sites contain at least one unfavorable financial term, with higher prevalence on less popular sites, and post-purchase terms being most common. The results demonstrate the feasibility of automated, large-scale detection of harmful T&Cs and highlight gaps in current defenses, underscoring the need for stronger consumer protections and ongoing monitoring. The authors release open-source tooling to enable longitudinal studies and broader analyses of terms and conditions in e-commerce.

Abstract

Terms and conditions for online shopping websites often contain terms that can have significant financial consequences for customers. Despite their impact, there is currently no comprehensive understanding of the types and potential risks associated with unfavorable financial terms. Furthermore, there are no publicly available detection systems or datasets to systematically identify or mitigate these terms. In this paper, we take the first steps toward solving this problem with three key contributions. \textit{First}, we introduce \textit{TermMiner}, an automated data collection and topic modeling pipeline to understand the landscape of unfavorable financial terms. \textit{Second}, we create \textit{ShopTC-100K}, a dataset of terms and conditions from shopping websites in the Tranco top 100K list, comprising 1.8 million terms from 8,251 websites. Consequently, we develop a taxonomy of 22 types from 4 categories of unfavorable financial terms -- spanning purchase, post-purchase, account termination, and legal aspects. \textit{Third}, we build \textit{TermLens}, an automated detector that uses Large Language Models (LLMs) to identify unfavorable financial terms. Fine-tuned on an annotated dataset, \textit{TermLens} achieves an F1 score of 94.6\% and a false positive rate of 2.3\% using GPT-4o. When applied to shopping websites from the Tranco top 100K, we find that 42.06\% of these sites contain at least one unfavorable financial term, with such terms being more prevalent on less popular websites. Case studies further highlight the financial risks and customer dissatisfaction associated with unfavorable financial terms, as well as the limitations of existing ecosystem defenses.

Harmful Terms and Where to Find Them: Measuring and Modeling Unfavorable Financial Terms and Conditions in Shopping Websites at Scale

TL;DR

This work addresses the problem of unfavorable financial terms in online shopping T&Cs by introducing a scalable pipeline (TermMiner) to collect and cluster T&Cs, a large English-term dataset (ShopTC-100K) with a four-category, 22-type taxonomy of unfavorable financial terms, and a detector (TermLens) built on GPT-4o that achieves and in zero-shot evaluation and after fine-tuning. Large-scale deployment finds that of Tranco top sites contain at least one unfavorable financial term, with higher prevalence on less popular sites, and post-purchase terms being most common. The results demonstrate the feasibility of automated, large-scale detection of harmful T&Cs and highlight gaps in current defenses, underscoring the need for stronger consumer protections and ongoing monitoring. The authors release open-source tooling to enable longitudinal studies and broader analyses of terms and conditions in e-commerce.

Abstract

Terms and conditions for online shopping websites often contain terms that can have significant financial consequences for customers. Despite their impact, there is currently no comprehensive understanding of the types and potential risks associated with unfavorable financial terms. Furthermore, there are no publicly available detection systems or datasets to systematically identify or mitigate these terms. In this paper, we take the first steps toward solving this problem with three key contributions. \textit{First}, we introduce \textit{TermMiner}, an automated data collection and topic modeling pipeline to understand the landscape of unfavorable financial terms. \textit{Second}, we create \textit{ShopTC-100K}, a dataset of terms and conditions from shopping websites in the Tranco top 100K list, comprising 1.8 million terms from 8,251 websites. Consequently, we develop a taxonomy of 22 types from 4 categories of unfavorable financial terms -- spanning purchase, post-purchase, account termination, and legal aspects. \textit{Third}, we build \textit{TermLens}, an automated detector that uses Large Language Models (LLMs) to identify unfavorable financial terms. Fine-tuned on an annotated dataset, \textit{TermLens} achieves an F1 score of 94.6\% and a false positive rate of 2.3\% using GPT-4o. When applied to shopping websites from the Tranco top 100K, we find that 42.06\% of these sites contain at least one unfavorable financial term, with such terms being more prevalent on less popular websites. Case studies further highlight the financial risks and customer dissatisfaction associated with unfavorable financial terms, as well as the limitations of existing ecosystem defenses.

Paper Structure

This paper contains 25 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Unfavorable financial term example --- (a) shows the payment page for Tone Fit Pro, a now-defunct website, with no mention of the subscription service on the payment page. (b) displays its T&Cs, stating customers are automatically enrolled in an $86/month Fitness App subscription with auto-renewal. (c) shows a screenshot of real-life victim complaints.
  • Figure 2: TermMiner (data collection and topic modeling pipeline)---(1) Measurement module: collects shopping websites from the Tranco list and fake e-commerce website datasets, extracting English terms and conditions from shopping websites. (2) Term classification module: classifies the terms into binary categories based on a given prompt. (3) Topic modeling module: leverages t5-base Sentence Transformer and DBSCAN for clustering. Topics are derived from the clusters using a combination of manual inspection and GPT-4o, employing a snowball sampling method goodman1961snowball to iteratively develop a topic template of terms.
  • Figure 3: TermLens Design--- (1) When the user activates the plugin, the current page URL is sent to the backend. (2) The terms and conditions are crawled and combined with the page information. (3) The pluggable LLM module analyzes the data to identify unfavorable financial terms. (4) Alerts are generated and displayed on the front end to warn users of potentially unfair financial terms.
  • Figure 4: Statistics from Large-scale measurement of unfavorable financial term detection on Tranco top 100K websites.
  • Figure 5: Extracted from the T&C of Celsius Network LLC, a now bankrupt cryptocurrency company.
  • ...and 2 more figures