Is Your Private Information Logged? An Empirical Study on Android App Logs
Zhiyuan Chen, Soham Sanjay Deo, Poorna Chander Reddy Puttaparthi, Vanessa Nava-Camal, Yiming Tang, Xueling Zhang, Weiyi Shang
TL;DR
This study addresses the privacy risks inherent in Android app logs by conducting an empirical investigation across three dimensions: real-world developer concerns, the prevalence of privacy leaks in a constructed Android log dataset, and the characteristics of leaking logs. The authors built a public dataset from 83 apps (with longitudinal updates to 78 latest versions), identified 610 privacy-leak points in the old data and 651 in the new data, and categorized leaks by data structure and logging context. They found five developer-concern categories, with the largest share focusing on replacing sensitive information, and demonstrated that most leaks originate from app-internal logs rather than third-party libraries, though third-party libraries can propagate leaks. The work reveals that privacy leaks in Android app logs often involve complex data structures, especially JSON, and that many leaks occur at runtime due to higher logging levels, underscoring the need for improved privacy-aware logging practices and tooling for detection and mitigation.
Abstract
With the rapid growth of mobile apps, users' concerns about their privacy have become increasingly prominent. Android app logs serve as crucial computer resources, aiding developers in debugging and monitoring the status of Android apps, while also containing a wealth of software system information. Previous studies have acknowledged privacy leaks in software logs and Android apps as significant issues without providing a comprehensive view of the privacy leaks in Android app logs. In this study, we build a comprehensive dataset of Android app logs and conduct an empirical study to analyze the status and severity of privacy leaks in Android app logs. Our study comprises three aspects: (1) Understanding real-world developers' concerns regarding privacy issues related to software logs; (2) Studying privacy leaks in the Android app logs; (3) Investigating the characteristics of privacy-leaking Android app logs and analyzing the reasons behind them. Our study reveals five different categories of concerns from real-world developers regarding privacy issues related to software logs and the prevalence of privacy leaks in Android app logs, with the majority stemming from developers' unawareness of such leaks. Additionally, our study provides developers with suggestions to safeguard their privacy from being logged.
