Table of Contents
Fetching ...

Safe RuleFit: Learning Optimal Sparse Rule Model by Meta Safe Screening

Hiroki Kato, Hiroyuki Hanada, Ichiro Takeuchi

TL;DR

This work tackles learning a sparse rule model where each rule is a binary indicator over a hyper-rectangle in the input space, and the rule dictionary is exponentially large. It introduces Safe RuleFit (SRF), which leverages meta safe screening (mSS) and a tree-structured rule space to safely prune vast portions of the rule set while solving convex L1-regularized problems for regression and classification. A key contribution is a principled framework for regularization-path computation and a scalable approach to include group sparsity (GSRF) via sparse group LASSO penalties, with corresponding dual-safety screenings. The proposed method achieves near-optimal rule selection with strong interpretability and competitive predictive performance, while substantially reducing computational cost compared to baselines that do not exhaustively consider all rules. The work offers practical impact for interpretable modeling in high-dimensional spaces where the full rule dictionary is infeasible to handle directly.

Abstract

We consider the problem of learning a sparse rule model, a prediction model in the form of a sparse linear combination of rules, where a rule is an indicator function defined over a hyper-rectangle in the input space. Since the number of all possible such rules is extremely large, it has been computationally intractable to select the optimal set of active rules. In this paper, to solve this difficulty for learning the optimal sparse rule model, we propose Safe RuleFit (SRF). Our basic idea is to develop meta safe screening (mSS), which is a non-trivial extension of well-known safe screening (SS) techniques. While SS is used for screening out one feature, mSS can be used for screening out multiple features by exploiting the inclusion-relations of hyper-rectangles in the input space. SRF provides a general framework for fitting sparse rule models for regression and classification, and it can be extended to handle more general sparse regularizations such as group regularization. We demonstrate the advantages of SRF through intensive numerical experiments.

Safe RuleFit: Learning Optimal Sparse Rule Model by Meta Safe Screening

TL;DR

This work tackles learning a sparse rule model where each rule is a binary indicator over a hyper-rectangle in the input space, and the rule dictionary is exponentially large. It introduces Safe RuleFit (SRF), which leverages meta safe screening (mSS) and a tree-structured rule space to safely prune vast portions of the rule set while solving convex L1-regularized problems for regression and classification. A key contribution is a principled framework for regularization-path computation and a scalable approach to include group sparsity (GSRF) via sparse group LASSO penalties, with corresponding dual-safety screenings. The proposed method achieves near-optimal rule selection with strong interpretability and competitive predictive performance, while substantially reducing computational cost compared to baselines that do not exhaustively consider all rules. The work offers practical impact for interpretable modeling in high-dimensional spaces where the full rule dictionary is infeasible to handle directly.

Abstract

We consider the problem of learning a sparse rule model, a prediction model in the form of a sparse linear combination of rules, where a rule is an indicator function defined over a hyper-rectangle in the input space. Since the number of all possible such rules is extremely large, it has been computationally intractable to select the optimal set of active rules. In this paper, to solve this difficulty for learning the optimal sparse rule model, we propose Safe RuleFit (SRF). Our basic idea is to develop meta safe screening (mSS), which is a non-trivial extension of well-known safe screening (SS) techniques. While SS is used for screening out one feature, mSS can be used for screening out multiple features by exploiting the inclusion-relations of hyper-rectangles in the input space. SRF provides a general framework for fitting sparse rule models for regression and classification, and it can be extended to handle more general sparse regularizations such as group regularization. We demonstrate the advantages of SRF through intensive numerical experiments.

Paper Structure

This paper contains 45 sections, 9 theorems, 60 equations, 19 figures, 7 tables, 4 algorithms.

Key Result

Lemma 1

For any $j\in[d]$ and any $k\in{\mathcal{R}}$,

Figures (19)

  • Figure 1: Illustrative example of sparse rule models. Here, a rule corresponds to a (hyper)rectangle in the input space. Among a large number of possible rules (hyper-rectangles), only a small subset of them (three rules in this example) are used in the prediction model.
  • Figure 2: Schematic formulation of meta safe screening (mSS) in the proposed SRF method.
  • Figure 3: Illustrative example of all possible rules to be considered in the case of a two-dimensional dataset represented by red crosses. Here, the vertical lines indicate members of $\bm \omega^{(1)}$, while the horizontal lines indicate members of $\bm \omega^{(2)}$. Any rectangles defined by selecting any two vertical lines (any two elements of $\bm \omega^{(1)}$) and any two horizontal lines (any two elements of $\bm \omega^{(2)}$) correspond to the rules to be considered.
  • Figure 4: Example of a closed rule (in red) with non-closed rules (in orange).
  • Figure 5: Rules with various combinations of input features, formulated as effective feature set (EFS). Given a set of rules, fewer number of distinct EFSs means the fewer number of plots representing the region the rule is satisfied, that is, easier to interpret.
  • ...and 14 more figures

Theorems & Definitions (20)

  • Remark 1
  • Lemma 1
  • Lemma 2: Theorem 3 in ndiaye2015gap
  • Lemma 3
  • Theorem 1
  • Corollary 1
  • Remark 2
  • Definition 1
  • Theorem 2
  • Theorem 3
  • ...and 10 more