Table of Contents
Fetching ...

Anonymizing Test Data in Android: Does It Hurt?

Elena Masserini, Davide Ginelli, Daniela Micucci, Daniela Briola, Leonardo Mariani

TL;DR

This work tackles the privacy risk of field failure data in Android apps by empirically evaluating how privacy-preserving techniques affect failure reproduction. It analyzes generalization, suppression, and perturbation methods across 19 input-dependent bugs from 17 open-source apps, using a large set of anonymization configurations and an Espresso-based reproduction workflow. The study finds that SCD Local Suppression markedly improves string reproduction with minimal information disclosure, while numeric data can be effectively anonymized via Local Suppression or Noise Addition, with trade-offs in effort and disclosure. The results provide practical guidance for selecting and configuring privacy-preserving techniques to balance privacy with reproducibility, and the authors release their dataset and tools for further reuse.

Abstract

Failure data collected from the field (e.g., failure traces, bug reports, and memory dumps) represent an invaluable source of information for developers who need to reproduce and analyze failures. Unfortunately, field data may include sensitive information and thus cannot be collected indiscriminately. Privacy-preserving techniques can address this problem anonymizing data and reducing the risk of disclosing personal information. However, collecting anonymized information may harm reproducibility, that is, the anonymized data may not allow the reproduction of a failure observed in the field. In this paper, we present an empirical investigation about the impact of privacy-preserving techniques on the reproducibility of failures. In particular, we study how five privacy-preserving techniques may impact reproducibilty for 19 bugs in 17 Android applications. Results provide insights on how to select and configure privacy-preserving techniques.

Anonymizing Test Data in Android: Does It Hurt?

TL;DR

This work tackles the privacy risk of field failure data in Android apps by empirically evaluating how privacy-preserving techniques affect failure reproduction. It analyzes generalization, suppression, and perturbation methods across 19 input-dependent bugs from 17 open-source apps, using a large set of anonymization configurations and an Espresso-based reproduction workflow. The study finds that SCD Local Suppression markedly improves string reproduction with minimal information disclosure, while numeric data can be effectively anonymized via Local Suppression or Noise Addition, with trade-offs in effort and disclosure. The results provide practical guidance for selecting and configuring privacy-preserving techniques to balance privacy with reproducibility, and the authors release their dataset and tools for further reuse.

Abstract

Failure data collected from the field (e.g., failure traces, bug reports, and memory dumps) represent an invaluable source of information for developers who need to reproduce and analyze failures. Unfortunately, field data may include sensitive information and thus cannot be collected indiscriminately. Privacy-preserving techniques can address this problem anonymizing data and reducing the risk of disclosing personal information. However, collecting anonymized information may harm reproducibility, that is, the anonymized data may not allow the reproduction of a failure observed in the field. In this paper, we present an empirical investigation about the impact of privacy-preserving techniques on the reproducibility of failures. In particular, we study how five privacy-preserving techniques may impact reproducibilty for 19 bugs in 17 Android applications. Results provide insights on how to select and configure privacy-preserving techniques.
Paper Structure (17 sections, 1 figure, 8 tables)