Table of Contents
Fetching ...

Testing Updated Apps by Adapting Learned Models

Chanh-Duc Ngo, Fabrizio Pastore, Lionel Briand

TL;DR

This work tackles the inefficiency of regression testing for frequently updated mobile Apps by repurposing learned models from prior app versions. The authors introduce CALM, which incrementally adapts an App model across versions using static-dynamic analysis, RCVDiff-based GUI differences, and runtime DSTG adaptation, along with layout guards, probabilistic action sequences, backward-equivalent state detection, and online/offline refinement. Empirical results on 52 app versions show CALM generally achieves higher coverage of updated methods/instructions and substantially reduces test oracle cost (fewer outputs to inspect) compared with ATUA and several SOTA tools, especially for small updates. The findings indicate CALM’s practical impact: enabling faster, more reliable testing of updated features with a constrained manual verification burden, while still identifying functional faults more effectively than key baselines. Overall, CALM demonstrates that strategic reuse and adaptation of learned models across app versions can significantly improve update testing efficiency and effectiveness in real-world Android apps.

Abstract

Although App updates are frequent and software engineers would like to verify updated features only, automated testing techniques verify entire Apps and are thus wasting resources. We present Continuous Adaptation of Learned Models (CALM), an automated App testing approach that efficiently test App updates by adapting App models learned when automatically testing previous App versions. CALM focuses on functional testing. Since functional correctness can be mainly verified through the visual inspection of App screens, CALM minimizes the number of App screens to be visualized by software testers while maximizing the percentage of updated methods and instructions exercised. Our empirical evaluation shows that CALM exercises a significantly higher proportion of updated methods and instructions than six state-of-the-art approaches, for the same maximum number of App screens to be visually inspected. Further, in common update scenarios, where only a small fraction of methods are updated, CALM is even quicker to outperform all competing approaches in a more significant way.

Testing Updated Apps by Adapting Learned Models

TL;DR

This work tackles the inefficiency of regression testing for frequently updated mobile Apps by repurposing learned models from prior app versions. The authors introduce CALM, which incrementally adapts an App model across versions using static-dynamic analysis, RCVDiff-based GUI differences, and runtime DSTG adaptation, along with layout guards, probabilistic action sequences, backward-equivalent state detection, and online/offline refinement. Empirical results on 52 app versions show CALM generally achieves higher coverage of updated methods/instructions and substantially reduces test oracle cost (fewer outputs to inspect) compared with ATUA and several SOTA tools, especially for small updates. The findings indicate CALM’s practical impact: enabling faster, more reliable testing of updated features with a constrained manual verification burden, while still identifying functional faults more effectively than key baselines. Overall, CALM demonstrates that strategic reuse and adaptation of learned models across app versions can significantly improve update testing efficiency and effectiveness in real-world Android apps.

Abstract

Although App updates are frequent and software engineers would like to verify updated features only, automated testing techniques verify entire Apps and are thus wasting resources. We present Continuous Adaptation of Learned Models (CALM), an automated App testing approach that efficiently test App updates by adapting App models learned when automatically testing previous App versions. CALM focuses on functional testing. Since functional correctness can be mainly verified through the visual inspection of App screens, CALM minimizes the number of App screens to be visualized by software testers while maximizing the percentage of updated methods and instructions exercised. Our empirical evaluation shows that CALM exercises a significantly higher proportion of updated methods and instructions than six state-of-the-art approaches, for the same maximum number of App screens to be visually inspected. Further, in common update scenarios, where only a small fraction of methods are updated, CALM is even quicker to outperform all competing approaches in a more significant way.
Paper Structure (34 sections, 10 equations, 18 figures, 7 tables)

This paper contains 34 sections, 10 equations, 18 figures, 7 tables.

Figures (18)

  • Figure 1: App Model Metamodel. Colors are used to group classes belonging to a specific metamodel component: GSTG (orange, top), DSTG (green, middle), EWTG (light blue, bottom). Classes in red are specific to CALM.
  • Figure 2: CALM App testing process
  • Figure 3: Example of action output provided to the end-user
  • Figure 4: An example of RCVDiff Model of EWTGs belonging to two App versions.
  • Figure 5: Illustration of how DSTG of Base App model is adapted in Updated App model accordingly to the RCVDiff model in Figure \ref{['fig:rcvdiff:wtgdiff']}
  • ...and 13 more figures