Table of Contents
Fetching ...

On the Costs and Benefits of Adopting Lifelong Learning for Software Analytics -- Empirical Study on Brown Build and Risk Prediction

Doriane Olewicki, Sarra Habchi, Mathieu Nayrolles, Mojtaba Faramarzi, Sarath Chandar, Bram Adams

TL;DR

Empirical evaluation of the use of lifelong learning for industrial use cases at Ubisoft shows how LL in practice manages to at least match traditional retraining-from-scratch performance in terms of F1-score, thus clearly showing the potential of LL setups in the industry.

Abstract

Nowadays, software analytics tools using machine learning (ML) models to, for example, predict the risk of a code change are well established. However, as the goals of a project shift over time, and developers and their habits change, the performance of said models tends to degrade (drift) over time. Current retraining practices typically require retraining a new model from scratch on a large updated dataset when performance decay is observed, thus incurring a computational cost; also there is no continuity between the models as the past model is discarded and ignored during the new model training. Even though the literature has taken interest in online learning approaches, those have rarely been integrated and evaluated in industrial environments. This paper evaluates the use of lifelong learning (LL) for industrial use cases at Ubisoft, evaluating both the performance and the required computational effort in comparison to the retraining-from-scratch approaches commonly used by the industry. LL is used to continuously build and maintain ML-based software analytics tools using an incremental learner that progressively updates the old model using new data. To avoid so-called "catastrophic forgetting" of important older data points, we adopt a replay buffer of older data, which still allows us to drastically reduce the size of the overall training dataset, and hence model training time.

On the Costs and Benefits of Adopting Lifelong Learning for Software Analytics -- Empirical Study on Brown Build and Risk Prediction

TL;DR

Empirical evaluation of the use of lifelong learning for industrial use cases at Ubisoft shows how LL in practice manages to at least match traditional retraining-from-scratch performance in terms of F1-score, thus clearly showing the potential of LL setups in the industry.

Abstract

Nowadays, software analytics tools using machine learning (ML) models to, for example, predict the risk of a code change are well established. However, as the goals of a project shift over time, and developers and their habits change, the performance of said models tends to degrade (drift) over time. Current retraining practices typically require retraining a new model from scratch on a large updated dataset when performance decay is observed, thus incurring a computational cost; also there is no continuity between the models as the past model is discarded and ignored during the new model training. Even though the literature has taken interest in online learning approaches, those have rarely been integrated and evaluated in industrial environments. This paper evaluates the use of lifelong learning (LL) for industrial use cases at Ubisoft, evaluating both the performance and the required computational effort in comparison to the retraining-from-scratch approaches commonly used by the industry. LL is used to continuously build and maintain ML-based software analytics tools using an incremental learner that progressively updates the old model using new data. To avoid so-called "catastrophic forgetting" of important older data points, we adopt a replay buffer of older data, which still allows us to drastically reduce the size of the overall training dataset, and hence model training time.
Paper Structure (23 sections, 3 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 3 figures, 8 tables, 1 algorithm.

Figures (3)

  • Figure 1: LL setup with validation set, initial training set and replay buffer of size $VWin=ITWin=RBWin=3$, resp.
  • Figure 2: Relative coefficient $coef_{rel}$ of training setups compared to the LL setup updates.
  • Figure 3: Discretized feature importance evolution for Risk_1.