NiaAutoARM: Automated generation and evaluation of Association Rule Mining pipelines
Uroš Mlakar, Iztok Fister, Iztok Fister
TL;DR
This work introduces NiaAutoARM, an AutoML method that automatically constructs full Numerical Association Rule Mining pipelines by framing ARM pipeline design as a continuous optimization problem controlled by an outer population-based meta-heuristic that selects the inner NI algorithm, its hyper-parameters, preprocessing steps, metrics, and metric weights. The pipeline is encoded as a real-valued vector and decoded via a Gamma mapping, with a surrogate outer fitness combining ARM rule quality metrics and a weighted inner objective based on a linear combination of metrics. The approach is evaluated on ten UC Irvine datasets using two outer optimizers (PSO and DE), comparing against the VARDE state-of-the-art; results show robust performance, with weight adaptation and multiple preprocessing steps providing gains in some cases and generally outperforming VARDE on several datasets. The findings suggest that automatic ARM pipeline construction can reduce expert burden while delivering competitive ARM pipelines, albeit with substantial computational costs, and point to future extensions including more NI algorithms, richer preprocessing, parallelization, and multi-objective optimization for trade-off analysis.
Abstract
The Numerical Association Rule Mining paradigm that includes concurrent dealing with numerical and categorical attributes is beneficial for discovering associations from datasets consisting of both features. The process is not considered as easy since it incorporates several processing steps running sequentially that form an entire pipeline, e.g., preprocessing, algorithm selection, hyper-parameter optimization, and the definition of metrics evaluating the quality of the association rule. In this paper, we proposed a novel Automated Machine Learning method, NiaAutoARM, for constructing the full association rule mining pipelines based on stochastic population-based meta-heuristics automatically. Along with the theoretical representation of the proposed method, we also present a comprehensive experimental evaluation of the proposed method.
