When Code Smells Meet ML: On the Lifecycle of ML-specific Code Smells in ML-enabled Systems

Gilberto Recupito; Giammaria Giordano; Filomena Ferrucci; Dario Di Nucci; Fabio Palomba

When Code Smells Meet ML: On the Lifecycle of ML-specific Code Smells in ML-enabled Systems

Gilberto Recupito, Giammaria Giordano, Filomena Ferrucci, Dario Di Nucci, Fabio Palomba

TL;DR

This paper analyzes over 400,000 commits from 337 ML-enabled projects and uses CodeSmile, a novel ML smell detector, to investigate the emergence and evolution of ML-CSs through a large-scale empirical study focusing on their prevalence, how they are introduced and removed, and their survivability.

Abstract

Context. The adoption of Machine Learning (ML)--enabled systems is steadily increasing. Nevertheless, there is a shortage of ML-specific quality assurance approaches, possibly because of the limited knowledge of how quality-related concerns emerge and evolve in ML-enabled systems. Objective. We aim to investigate the emergence and evolution of specific types of quality-related concerns known as ML-specific code smells, i.e., sub-optimal implementation solutions applied on ML pipelines that may significantly decrease both the quality and maintainability of ML-enabled systems. More specifically, we present a plan to study ML-specific code smells by empirically analyzing (i) their prevalence in real ML-enabled systems, (ii) how they are introduced and removed, and (iii) their survivability. Method. We will conduct an exploratory study, mining a large dataset of ML-enabled systems and analyzing over 400k commits about 337 projects. We will track and inspect the introduction and evolution of ML smells through CodeSmile, a novel ML smell detector that we will build to enable our investigation and to detect ML-specific code smells.

When Code Smells Meet ML: On the Lifecycle of ML-specific Code Smells in ML-enabled Systems

TL;DR

Abstract

Paper Structure (23 sections, 1 figure, 4 tables)

This paper contains 23 sections, 1 figure, 4 tables.

Introduction
Background and Related Work
Background
Related Work
Research Method
Goal and Research Questions
Dataset Description and Projects Selection
Data Extraction
ML-Specific Code Smell Detection
Commit Data Extraction
Data Analysis
RQ$_0$: How are ML-specific code smells prevalent in ML-enabled systems?
RQ$_1$: When are ML-Specific code smells introduced in ML-enabled systems?
RQ$_2$: What tasks were performed when the ML-Specific code smells were introduced?
RQ$_3$: When and how ML-specific code smells are removed in ML-enabled systems?
...and 8 more sections

Figures (1)

Figure 1: The process designed for the study.

When Code Smells Meet ML: On the Lifecycle of ML-specific Code Smells in ML-enabled Systems

TL;DR

Abstract

When Code Smells Meet ML: On the Lifecycle of ML-specific Code Smells in ML-enabled Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (1)