Surveillance Capitalism Revealed: Tracing The Hidden World Of Web Data Collection
Antony Seabra de Medeiros, Luiz Afonso Glatzl Junior, Sergio Lifschitz
TL;DR
The paper tackles the problem of answering complex questions over heterogeneous data sources, combining unstructured contract documents with structured relational data. It introduces a unified architecture that jointly uses Retrieval-Augmented Generation, Text-to-SQL, Dynamic Prompt Engineering, and multi-agent orchestration to route queries to appropriate retrieval pathways. Empirical evaluation in the contract-management domain demonstrates improved accuracy and contextual relevance, with a hybrid vectorstore-backed retrieval (ChromaDB) and SQLite data enabling seamless cross-source answers and graph-based visualizations. The approach provides a practical, scalable framework for cross-source information retrieval that can be extended to other domains requiring precise, context-aware data access.
Abstract
This study investigates the mechanisms of Surveillance Capitalism, focusing on personal data transfer during web navigation and searching. Analyzing network traffic reveals how various entities track and harvest digital footprints. The research reveals specific data types exchanged between users and web services, emphasizing the sophisticated algorithms involved in these processes. We present concrete evidence of data harvesting practices and propose strategies for enhancing data protection and transparency. Our findings highlight the need for robust data protection frameworks and ethical data usage to address privacy concerns in the digital age.
