Apriori_Goal algorithm for constructing association rules for a database with a given classification
Vladimir Billig
TL;DR
The paper addresses mining association rules in relational databases with a predefined target by introducing Apriori_Goal, which generates rules of the form $X \Rightarrow Goal_k$ and supports negative rules. It shifts the focus from frequency to confidence and correlation by defining five criteria (f_g, f_all, confidence, correlation, and quality) and leverages a binary-attribute encoding with per-target data partitioning to achieve efficiency. The authors prove key properties, including anti-monotonicity of frequency and monotonicity of correlation, and demonstrate practical performance on a medical dataset, highlighting the ability to discover high-confidence, rare, or negative rules. The approach offers a scalable, parallelizable framework suitable for real-world databases with target parameters, enabling more targeted and interpretable rule discovery for decision support.
Abstract
An efficient Apriori_Goal algorithm is proposed for constructing association rules in a relational database with predefined classification. The target parameter of the database specifies a finite number of goals $Goal_k$, for each of which the algorithm constructs association rules of the form $X \Rightarrow Goal_k$. The quality of the generated rules is characterized by five criteria: two represent rule frequency, two reflect rule reliability, and the fifth is a weighted sum of these four criteria. The algorithm initially generates rules with single premises, where the correlation criterion between the premise $X$ and the conclusion $Goal_k$ exceeds a specified threshold. Then, rules with extended premises are built based on the anti-monotonicity of rule frequency criteria and the monotonicity of rule reliability criteria. Newly constructed rules tend to decrease in frequency while increasing in reliability. The article proves several statements that justify the rule construction process. The algorithm enables the construction of both high-frequency and rare rules with low occurrence frequency but high reliability. It also allows for the generation of negative rules with negative correlation between the premise and conclusion, which can be valuable in practical applications for filtering out undesired goals. The efficiency of the algorithm is based on two factors: the method of encoding the database and its partitioning into subsets linked to the target parameter. Time complexity estimates for rule construction are provided using a medical database as an example.
