Table of Contents
Fetching ...

Unraveling Code Clone Dynamics in Deep Learning Frameworks

Maram Assi, Safwat Hassan, Ying Zou

TL;DR

The paper tackles how code clones evolve in DL frameworks by conducting an empirical study across nine popular Python-based frameworks. It introduces a dual-timescale analysis: long-term clone trends over releases and short-term within-release clone patterns, coupled with a cross-framework, file-level clone analysis. The authors uncover four lifelong cloning trends, show that short-term patterns influence long-term trajectories, and identify two categories of cross-framework clones—functional and architectural adaptation—highlighting opportunities for collaboration and standardized practices. Methodologically, they combine NiCad and SourcererCC clone detection, dynamic time warping, and time-series clustering to derive actionable insights for maintainability and code reuse in DL ecosystems. The work provides replication data and emphasizes implications for clone management, abstract-class design, and cross-framework standardization, with potential extensions to additional languages and broader DL tooling.

Abstract

Deep Learning (DL) frameworks play a critical role in advancing artificial intelligence, and their rapid growth underscores the need for a comprehensive understanding of software quality and maintainability. DL frameworks, like other systems, are prone to code clones. Code clones refer to identical or highly similar source code fragments within the same project or even across different projects. Code cloning can have positive and negative implications for software development, influencing maintenance, readability, and bug propagation. In this paper, we aim to address the knowledge gap concerning the evolutionary dimension of code clones in DL frameworks and the extent of code reuse across these frameworks. We empirically analyze code clones in nine popular DL frameworks, i.e., TensorFlow, Paddle, PyTorch, Aesara, Ray, MXNet, Keras, Jax and BentoML, to investigate (1) the characteristics of the long-term code cloning evolution over releases in each framework, (2) the short-term, i.e., within-release, code cloning patterns and their influence on the long-term trends, and (3) the file-level code clones within the DL frameworks. Our findings reveal that DL frameworks adopt four distinct cloning trends and that these trends present some common and distinct characteristics. For instance, bug-fixing activities persistently happen in clones irrespective of the clone evolutionary trend but occur more in the "Serpentine" trend. Moreover, the within release level investigation demonstrates that short-term code cloning practices impact long-term cloning trends. The cross-framework code clone investigation reveals the presence of functional and architectural adaptation file-level cross-framework code clones across the nine studied frameworks. We provide insights that foster robust clone practices and collaborative maintenance in the development of DL frameworks.

Unraveling Code Clone Dynamics in Deep Learning Frameworks

TL;DR

The paper tackles how code clones evolve in DL frameworks by conducting an empirical study across nine popular Python-based frameworks. It introduces a dual-timescale analysis: long-term clone trends over releases and short-term within-release clone patterns, coupled with a cross-framework, file-level clone analysis. The authors uncover four lifelong cloning trends, show that short-term patterns influence long-term trajectories, and identify two categories of cross-framework clones—functional and architectural adaptation—highlighting opportunities for collaboration and standardized practices. Methodologically, they combine NiCad and SourcererCC clone detection, dynamic time warping, and time-series clustering to derive actionable insights for maintainability and code reuse in DL ecosystems. The work provides replication data and emphasizes implications for clone management, abstract-class design, and cross-framework standardization, with potential extensions to additional languages and broader DL tooling.

Abstract

Deep Learning (DL) frameworks play a critical role in advancing artificial intelligence, and their rapid growth underscores the need for a comprehensive understanding of software quality and maintainability. DL frameworks, like other systems, are prone to code clones. Code clones refer to identical or highly similar source code fragments within the same project or even across different projects. Code cloning can have positive and negative implications for software development, influencing maintenance, readability, and bug propagation. In this paper, we aim to address the knowledge gap concerning the evolutionary dimension of code clones in DL frameworks and the extent of code reuse across these frameworks. We empirically analyze code clones in nine popular DL frameworks, i.e., TensorFlow, Paddle, PyTorch, Aesara, Ray, MXNet, Keras, Jax and BentoML, to investigate (1) the characteristics of the long-term code cloning evolution over releases in each framework, (2) the short-term, i.e., within-release, code cloning patterns and their influence on the long-term trends, and (3) the file-level code clones within the DL frameworks. Our findings reveal that DL frameworks adopt four distinct cloning trends and that these trends present some common and distinct characteristics. For instance, bug-fixing activities persistently happen in clones irrespective of the clone evolutionary trend but occur more in the "Serpentine" trend. Moreover, the within release level investigation demonstrates that short-term code cloning practices impact long-term cloning trends. The cross-framework code clone investigation reveals the presence of functional and architectural adaptation file-level cross-framework code clones across the nine studied frameworks. We provide insights that foster robust clone practices and collaborative maintenance in the development of DL frameworks.
Paper Structure (30 sections, 3 equations, 6 figures, 7 tables)

This paper contains 30 sections, 3 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: An overview of our approach for analyzing code cloning in DL frameworks.
  • Figure 3: Subset of the within-release time series "Steady", "Descending" and "Ascending patterns". Every time series represents the evolution of clone size within a particular release of a DL framework.
  • Figure 4: The bug-proneness evolution in cloned code over releases.
  • Figure 5: Bug-fixing commits distribution by "thin clones" and "thick clones".
  • Figure 6: The evolution of code clone community size over releases.
  • ...and 1 more figures