Safe RuleFit: Learning Optimal Sparse Rule Model by Meta Safe Screening

Hiroki Kato; Hiroyuki Hanada; Ichiro Takeuchi

Safe RuleFit: Learning Optimal Sparse Rule Model by Meta Safe Screening

Hiroki Kato, Hiroyuki Hanada, Ichiro Takeuchi

TL;DR

This work tackles learning a sparse rule model where each rule is a binary indicator over a hyper-rectangle in the input space, and the rule dictionary is exponentially large. It introduces Safe RuleFit (SRF), which leverages meta safe screening (mSS) and a tree-structured rule space to safely prune vast portions of the rule set while solving convex L1-regularized problems for regression and classification. A key contribution is a principled framework for regularization-path computation and a scalable approach to include group sparsity (GSRF) via sparse group LASSO penalties, with corresponding dual-safety screenings. The proposed method achieves near-optimal rule selection with strong interpretability and competitive predictive performance, while substantially reducing computational cost compared to baselines that do not exhaustively consider all rules. The work offers practical impact for interpretable modeling in high-dimensional spaces where the full rule dictionary is infeasible to handle directly.

Abstract

We consider the problem of learning a sparse rule model, a prediction model in the form of a sparse linear combination of rules, where a rule is an indicator function defined over a hyper-rectangle in the input space. Since the number of all possible such rules is extremely large, it has been computationally intractable to select the optimal set of active rules. In this paper, to solve this difficulty for learning the optimal sparse rule model, we propose Safe RuleFit (SRF). Our basic idea is to develop meta safe screening (mSS), which is a non-trivial extension of well-known safe screening (SS) techniques. While SS is used for screening out one feature, mSS can be used for screening out multiple features by exploiting the inclusion-relations of hyper-rectangles in the input space. SRF provides a general framework for fitting sparse rule models for regression and classification, and it can be extended to handle more general sparse regularizations such as group regularization. We demonstrate the advantages of SRF through intensive numerical experiments.

Safe RuleFit: Learning Optimal Sparse Rule Model by Meta Safe Screening

TL;DR

Abstract

Safe RuleFit: Learning Optimal Sparse Rule Model by Meta Safe Screening

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (20)