Table of Contents
Fetching ...

GEML: A Grammar-based Evolutionary Machine Learning Approach for Design-Pattern Detection

Rafael Barbudo, Aurora Ramírez, Francisco Servant, José Raúl Romero

TL;DR

This work introduces GEML, a grammar-based evolutionary machine learning approach for automatic design pattern detection. It combines associative classification with grammar-guided genetic programming to learn readable rule-based detectors described by a context-free grammar, enabling flexible, per-pattern learning without extensive parameter tuning. Through parameter studies and cross-pattern experiments on DPB and P-Mart, GEML demonstrates competitive accuracy and robustness, outperforming some ML and non-ML DPD methods while maintaining interpretability. A demonstration tool accompanies the method, illustrating practical deployment, customization for new patterns, and potential integration into development workflows. The approach offers a scalable, human-readable alternative for DP detection with strong adaptability to organizational coding practices.

Abstract

Design patterns (DPs) are recognised as a good practice in software development. However, the lack of appropriate documentation often hampers traceability, and their benefits are blurred among thousands of lines of code. Automatic methods for DP detection have become relevant but are usually based on the rigid analysis of either software metrics or specific properties of the source code. We propose GEML, a novel detection approach based on evolutionary machine learning using software properties of diverse nature. Firstly, GEML makes use of an evolutionary algorithm to extract those characteristics that better describe the DP, formulated in terms of human-readable rules, whose syntax is conformant with a context-free grammar. Secondly, a rule-based classifier is built to predict whether new code contains a hidden DP implementation. GEML has been validated over five DPs taken from a public repository recurrently adopted by machine learning studies. Then, we increase this number up to 15 diverse DPs, showing its effectiveness and robustness in terms of detection capability. An initial parameter study served to tune a parameter setup whose performance guarantees the general applicability of this approach without the need to adjust complex parameters to a specific pattern. Finally, a demonstration tool is also provided.

GEML: A Grammar-based Evolutionary Machine Learning Approach for Design-Pattern Detection

TL;DR

This work introduces GEML, a grammar-based evolutionary machine learning approach for automatic design pattern detection. It combines associative classification with grammar-guided genetic programming to learn readable rule-based detectors described by a context-free grammar, enabling flexible, per-pattern learning without extensive parameter tuning. Through parameter studies and cross-pattern experiments on DPB and P-Mart, GEML demonstrates competitive accuracy and robustness, outperforming some ML and non-ML DPD methods while maintaining interpretability. A demonstration tool accompanies the method, illustrating practical deployment, customization for new patterns, and potential integration into development workflows. The approach offers a scalable, human-readable alternative for DP detection with strong adaptability to organizational coding practices.

Abstract

Design patterns (DPs) are recognised as a good practice in software development. However, the lack of appropriate documentation often hampers traceability, and their benefits are blurred among thousands of lines of code. Automatic methods for DP detection have become relevant but are usually based on the rigid analysis of either software metrics or specific properties of the source code. We propose GEML, a novel detection approach based on evolutionary machine learning using software properties of diverse nature. Firstly, GEML makes use of an evolutionary algorithm to extract those characteristics that better describe the DP, formulated in terms of human-readable rules, whose syntax is conformant with a context-free grammar. Secondly, a rule-based classifier is built to predict whether new code contains a hidden DP implementation. GEML has been validated over five DPs taken from a public repository recurrently adopted by machine learning studies. Then, we increase this number up to 15 diverse DPs, showing its effectiveness and robustness in terms of detection capability. An initial parameter study served to tune a parameter setup whose performance guarantees the general applicability of this approach without the need to adjust complex parameters to a specific pattern. Finally, a demonstration tool is also provided.
Paper Structure (31 sections, 2 equations, 10 figures, 18 tables, 2 algorithms)

This paper contains 31 sections, 2 equations, 10 figures, 18 tables, 2 algorithms.

Figures (10)

  • Figure 1: Two-phased model for design pattern detection
  • Figure 2: Example of correspondence between genotype and phenotype for an illustrative individual
  • Figure 3: Grammar used by the G3P4DPD algorithm
  • Figure 4: Example of the crossover operator
  • Figure 5: Examples of the mutation operators
  • ...and 5 more figures