Table of Contents
Fetching ...

Guarding Digital Privacy: Exploring User Profiling and Security Enhancements

Rishika Kohli, Shaifu Gupta, Manoj Singh Gaur

TL;DR

The paper tackles the dual challenge of enabling personalized services through user profiling while safeguarding privacy. It surveys profiling methods, data flows, and the roles of data brokers, then empirically tests PII leakage in Android apps from India and presents an enhanced ML framework that uses decision trees and neural networks with LIME explanations, achieving a peak accuracy of $75.01\%$ with a training time of $3.62$ seconds for neural networks. Its contributions include a detailed taxonomy of profiling processes, a data-broker analysis (Acxiom and Oracle), and a case-study-derived ML approach for PII detection in network traffic, underscored by an explainable-AI evaluation. The work demonstrates tangible privacy risks in real-world app traffic and offers concrete, interpretable methods and future directions for privacy-preserving profiling and stronger digital-security measures. Overall, the paper advances understanding of profiling pipelines, exposes practical vulnerabilities, and proposes an actionable framework for detecting and explaining sensitive data exposure in mobile ecosystems.

Abstract

User profiling, the practice of collecting user information for personalized recommendations, has become widespread, driving progress in technology. However, this growth poses a threat to user privacy, as devices often collect sensitive data without their owners' awareness. This article aims to consolidate knowledge on user profiling, exploring various approaches and associated challenges. Through the lens of two companies sharing user data and an analysis of 18 popular Android applications in India across various categories, including $\textit{Social, Education, Entertainment, Travel, Shopping and Others}$, the article unveils privacy vulnerabilities. Further, the article propose an enhanced machine learning framework, employing decision trees and neural networks, that improves state-of-the-art classifiers in detecting personal information exposure. Leveraging the XAI (explainable artificial intelligence) algorithm LIME (Local Interpretable Model-agnostic Explanations), it enhances interpretability, crucial for reliably identifying sensitive data. Results demonstrate a noteworthy performance boost, achieving a $75.01\%$ accuracy with a reduced training time of $3.62$ seconds for neural networks. Concluding, the paper suggests research directions to strengthen digital security measures.

Guarding Digital Privacy: Exploring User Profiling and Security Enhancements

TL;DR

The paper tackles the dual challenge of enabling personalized services through user profiling while safeguarding privacy. It surveys profiling methods, data flows, and the roles of data brokers, then empirically tests PII leakage in Android apps from India and presents an enhanced ML framework that uses decision trees and neural networks with LIME explanations, achieving a peak accuracy of with a training time of seconds for neural networks. Its contributions include a detailed taxonomy of profiling processes, a data-broker analysis (Acxiom and Oracle), and a case-study-derived ML approach for PII detection in network traffic, underscored by an explainable-AI evaluation. The work demonstrates tangible privacy risks in real-world app traffic and offers concrete, interpretable methods and future directions for privacy-preserving profiling and stronger digital-security measures. Overall, the paper advances understanding of profiling pipelines, exposes practical vulnerabilities, and proposes an actionable framework for detecting and explaining sensitive data exposure in mobile ecosystems.

Abstract

User profiling, the practice of collecting user information for personalized recommendations, has become widespread, driving progress in technology. However, this growth poses a threat to user privacy, as devices often collect sensitive data without their owners' awareness. This article aims to consolidate knowledge on user profiling, exploring various approaches and associated challenges. Through the lens of two companies sharing user data and an analysis of 18 popular Android applications in India across various categories, including , the article unveils privacy vulnerabilities. Further, the article propose an enhanced machine learning framework, employing decision trees and neural networks, that improves state-of-the-art classifiers in detecting personal information exposure. Leveraging the XAI (explainable artificial intelligence) algorithm LIME (Local Interpretable Model-agnostic Explanations), it enhances interpretability, crucial for reliably identifying sensitive data. Results demonstrate a noteworthy performance boost, achieving a accuracy with a reduced training time of seconds for neural networks. Concluding, the paper suggests research directions to strengthen digital security measures.

Paper Structure

This paper contains 36 sections, 5 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: Taxonomy of user profiling
  • Figure 2: Identifiers used to track users across devices and platforms
  • Figure 3: Source: Disconnect.me. This figure illustrates the presence of trackers on the ICICI bank website. Colored domains signify tracking sites, while gray domains may also track users. Left part: domains informed when the user opens the website without logging in. Right part: domains informed when the user has logged in. (Accessed Date: October, 2021)
  • Figure 4: Some high impact data breaches reported between 2022 and 2024. [r153, r130, r154, r155,new4,bb1]
  • Figure 5: Data capturing framework
  • ...and 4 more figures