Table of Contents
Fetching ...

AI-Specific Code Smells: From Specification to Detection

Brahim Mahmoudi, Naouel Moha, Quentin Stiévenart, Florent Avellaneda

TL;DR

AI-specific code smells address patterns in AI-based software that traditional linters miss. SpecDetect4AI uses a declarative DSL to specify 22 AI-specific smells and a static-analysis pipeline to detect them at scale, offering extensibility and fast per-project analysis. On 826 AI-focused systems with over 20M lines of code, it achieves 88.66% precision and 88.89% recall, outperforming CodeSmile, mlpylint, and LLM baselines while remaining significantly faster. The work includes an extensibility study and open replication artifacts, underscoring practical impact for building more reliable AI software.

Abstract

The rise of Artificial Intelligence (AI) is reshaping how software systems are developed and maintained. However, AI-based systems give rise to new software issues that existing detection tools often miss. Among these, we focus on AI-specific code smells, recurring patterns in the code that may indicate deeper problems such as unreproducibility, silent failures, or poor model generalization. We introduce SpecDetect4AI, a tool-based approach for the specification and detection of these code smells at scale. This approach combines a high-level declarative Domain-Specific Language (DSL) for rule specification with an extensible static analysis tool that interprets and detects these rules for AI-based systems. We specified 22 AI-specific code smells and evaluated SpecDetect4AI on 826 AI-based systems (20M lines of code), achieving a precision of 88.66% and a recall of 88.89%, outperforming other existing detection tools. Our results show that SpecDetect4AI supports the specification and detection of AI-specific code smells through dedicated rules and can effectively analyze large AI-based systems, demonstrating both efficiency and extensibility (SUS 81.7/100).

AI-Specific Code Smells: From Specification to Detection

TL;DR

AI-specific code smells address patterns in AI-based software that traditional linters miss. SpecDetect4AI uses a declarative DSL to specify 22 AI-specific smells and a static-analysis pipeline to detect them at scale, offering extensibility and fast per-project analysis. On 826 AI-focused systems with over 20M lines of code, it achieves 88.66% precision and 88.89% recall, outperforming CodeSmile, mlpylint, and LLM baselines while remaining significantly faster. The work includes an extensibility study and open replication artifacts, underscoring practical impact for building more reliable AI software.

Abstract

The rise of Artificial Intelligence (AI) is reshaping how software systems are developed and maintained. However, AI-based systems give rise to new software issues that existing detection tools often miss. Among these, we focus on AI-specific code smells, recurring patterns in the code that may indicate deeper problems such as unreproducibility, silent failures, or poor model generalization. We introduce SpecDetect4AI, a tool-based approach for the specification and detection of these code smells at scale. This approach combines a high-level declarative Domain-Specific Language (DSL) for rule specification with an extensible static analysis tool that interprets and detects these rules for AI-based systems. We specified 22 AI-specific code smells and evaluated SpecDetect4AI on 826 AI-based systems (20M lines of code), achieving a precision of 88.66% and a recall of 88.89%, outperforming other existing detection tools. Our results show that SpecDetect4AI supports the specification and detection of AI-specific code smells through dedicated rules and can effectively analyze large AI-based systems, demonstrating both efficiency and extensibility (SUS 81.7/100).

Paper Structure

This paper contains 25 sections, 3 figures, 6 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of SpecDetect4AI’s three-step approach for specifying informal AI-specific code smell descriptions into executable detection rules.
  • Figure 2: Per-rule F$_1$-scores on the intersection of supported smells (darker = higher). “NA” marks rules outside an approach’s coverage.
  • Figure 3: Execution time vs. system size (values above the 95th percentile capped)