Table of Contents
Fetching ...

Utilizing Large LanguageModels to Detect Privacy Leaks in Mini-App Code

Liming Jiang

TL;DR

This paper explores the feasibility of using Large Language Models to detect privacy leaks in WeChat Mini Programs, a prominent class of mini-apps embedded in messaging platforms. It proposes a multimodal analysis framework that combines text, images, and other content within WeChat codes to identify potentially sensitive information, supported by a data-collection and preprocessing pipeline, model fine-tuning, and multimodal fusion. The evaluation uses an online code example analyzed by a GPT-based system, reporting precision, recall, and F1 on a held-out validation set to gauge detection performance, complemented by qualitative error analysis. The work highlights both the potential of LLMs to enhance privacy monitoring in mini-app ecosystems and the need for careful handling of biases, dynamic content, and human-in-the-loop validation to achieve reliable, scalable privacy protection in real-world deployments.

Abstract

Mini-applications, commonly referred to as mini-apps, are compact software programs embedded within larger applications or platforms, offering targeted functionality without the need for separate installations. Typically web-based or cloud-hosted, these mini-apps streamline user experiences by providing focused services accessible through web browsers or mobile apps. Their simplicity, speed, and integration capabilities make them valuable additions to messaging platforms, social media networks, e-commerce sites, and various digital environments. WeChat Mini Programs, a prominent feature of China's leading messaging app, exemplify this trend, offering users a seamless array of services without additional downloads. Leveraging WeChat's extensive user base and payment infrastructure, Mini Programs facilitate efficient transactions and bridge online and offline experiences, shaping China's digital landscape significantly. This paper investigates the potential of employing Large Language Models (LLMs) to detect privacy breaches within WeChat Mini Programs. Given the widespread use of Mini Programs and growing concerns about data privacy, this research seeks to determine if LLMs can effectively identify instances of privacy leakage within this ecosystem. Through meticulous analysis and experimentation, we aim to highlight the efficacy of LLMs in safeguarding user privacy and security within the WeChat Mini Program environment, thereby contributing to a more secure digital landscape.

Utilizing Large LanguageModels to Detect Privacy Leaks in Mini-App Code

TL;DR

This paper explores the feasibility of using Large Language Models to detect privacy leaks in WeChat Mini Programs, a prominent class of mini-apps embedded in messaging platforms. It proposes a multimodal analysis framework that combines text, images, and other content within WeChat codes to identify potentially sensitive information, supported by a data-collection and preprocessing pipeline, model fine-tuning, and multimodal fusion. The evaluation uses an online code example analyzed by a GPT-based system, reporting precision, recall, and F1 on a held-out validation set to gauge detection performance, complemented by qualitative error analysis. The work highlights both the potential of LLMs to enhance privacy monitoring in mini-app ecosystems and the need for careful handling of biases, dynamic content, and human-in-the-loop validation to achieve reliable, scalable privacy protection in real-world deployments.

Abstract

Mini-applications, commonly referred to as mini-apps, are compact software programs embedded within larger applications or platforms, offering targeted functionality without the need for separate installations. Typically web-based or cloud-hosted, these mini-apps streamline user experiences by providing focused services accessible through web browsers or mobile apps. Their simplicity, speed, and integration capabilities make them valuable additions to messaging platforms, social media networks, e-commerce sites, and various digital environments. WeChat Mini Programs, a prominent feature of China's leading messaging app, exemplify this trend, offering users a seamless array of services without additional downloads. Leveraging WeChat's extensive user base and payment infrastructure, Mini Programs facilitate efficient transactions and bridge online and offline experiences, shaping China's digital landscape significantly. This paper investigates the potential of employing Large Language Models (LLMs) to detect privacy breaches within WeChat Mini Programs. Given the widespread use of Mini Programs and growing concerns about data privacy, this research seeks to determine if LLMs can effectively identify instances of privacy leakage within this ecosystem. Through meticulous analysis and experimentation, we aim to highlight the efficacy of LLMs in safeguarding user privacy and security within the WeChat Mini Program environment, thereby contributing to a more secure digital landscape.
Paper Structure (12 sections, 1 figure, 1 table)