Table of Contents
Fetching ...

An Empirical Study of Code Obfuscation Practices in the Google Play Store

Akila Niroshan, Suranga Seneviratne, Aruna Seneviratne

TL;DR

This work presents a large-scale, longitudinal study of code obfuscation in the Google Play Store by building a three-part ML-based detection framework that identifies obfuscated APKs, the tools used, and the obfuscation techniques. Applied to over 548,000 APKs from two major snapshots (2016–2018 and 2021–2023), the study finds a 13% rise in obfuscation overall, with ProGuard and Allatori as the dominant tools and identifier renaming as the most pervasive technique. It also reveals that obfuscation is more common among top-ranked apps and gaming genres, notably Casino apps, and that top developers increasingly employ obfuscation across their portfolios. The findings have practical implications for developers, security analysts, and app-store regulators, highlighting the need for robust deobfuscation methods and policy considerations to balance protection with transparency. Overall, this first large-scale, multi-tool obfuscation analysis provides valuable insights into the evolution of code protection practices in a major app ecosystem.

Abstract

The Android ecosystem is vulnerable to issues such as app repackaging, counterfeiting, and piracy, threatening both developers and users. To mitigate these risks, developers often employ code obfuscation techniques. However, while effective in protecting legitimate applications, obfuscation also hinders security investigations as it is often exploited for malicious purposes. As such, it is important to understand code obfuscation practices in Android apps. In this paper, we analyze over 500,000 Android APKs from Google Play, spanning an eight-year period, to investigate the evolution and prevalence of code obfuscation techniques. First, we propose a set of classifiers to detect obfuscated code, tools, and techniques and then conduct a longitudinal analysis to identify trends. Our results show a 13% increase in obfuscation from 2016 to 2023, with ProGuard and Allatori as the most commonly used tools. We also show that obfuscation is more prevalent in top-ranked apps and gaming genres such as Casino apps. To our knowledge, this is the first large-scale study of obfuscation adoption in the Google Play Store, providing insights for developers and security analysts.

An Empirical Study of Code Obfuscation Practices in the Google Play Store

TL;DR

This work presents a large-scale, longitudinal study of code obfuscation in the Google Play Store by building a three-part ML-based detection framework that identifies obfuscated APKs, the tools used, and the obfuscation techniques. Applied to over 548,000 APKs from two major snapshots (2016–2018 and 2021–2023), the study finds a 13% rise in obfuscation overall, with ProGuard and Allatori as the dominant tools and identifier renaming as the most pervasive technique. It also reveals that obfuscation is more common among top-ranked apps and gaming genres, notably Casino apps, and that top developers increasingly employ obfuscation across their portfolios. The findings have practical implications for developers, security analysts, and app-store regulators, highlighting the need for robust deobfuscation methods and policy considerations to balance protection with transparency. Overall, this first large-scale, multi-tool obfuscation analysis provides valuable insights into the evolution of code protection practices in a major app ecosystem.

Abstract

The Android ecosystem is vulnerable to issues such as app repackaging, counterfeiting, and piracy, threatening both developers and users. To mitigate these risks, developers often employ code obfuscation techniques. However, while effective in protecting legitimate applications, obfuscation also hinders security investigations as it is often exploited for malicious purposes. As such, it is important to understand code obfuscation practices in Android apps. In this paper, we analyze over 500,000 Android APKs from Google Play, spanning an eight-year period, to investigate the evolution and prevalence of code obfuscation techniques. First, we propose a set of classifiers to detect obfuscated code, tools, and techniques and then conduct a longitudinal analysis to identify trends. Our results show a 13% increase in obfuscation from 2016 to 2023, with ProGuard and Allatori as the most commonly used tools. We also show that obfuscation is more prevalent in top-ranked apps and gaming genres such as Casino apps. To our knowledge, this is the first large-scale study of obfuscation adoption in the Google Play Store, providing insights for developers and security analysts.

Paper Structure

This paper contains 39 sections, 3 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: Overall experiment process: Training and testing and large-scale investigation.
  • Figure 2: Percentage of obfuscated apps by year
  • Figure 3: Yearly obfuscation tool usage
  • Figure 4: Yearly obfuscation technique usage
  • Figure 5: Obfuscated app percentage by genre (overall)
  • ...and 3 more figures