SR4-Fit: An Interpretable and Informative Classification Algorithm Applied to Prediction of U.S. House of Representatives Elections

Shyam Sundar Murali Krishnan; Dean Frederick Hougen

SR4-Fit: An Interpretable and Informative Classification Algorithm Applied to Prediction of U.S. House of Representatives Elections

Shyam Sundar Murali Krishnan, Dean Frederick Hougen

TL;DR

SR4-Fit addresses the interpretability–accuracy trade-off in classification by uniting RuleFit-style rule extraction with Sparse Relaxed Regularized Regression. It optimizes a logistic-loss objective with sparsity and a coupling term between $β$ and $w$, learned via an alternating strategy, and produces concise, human-readable rules derived from decision-tree paths. Empirically, SR4-Fit delivers high accuracy with low variance, outperforming black-box models like Random Forest and SVM and exceeding RuleFit in interpretability across electoral and standard public datasets. The work demonstrates that interpretable rule-based models can achieve competitive performance and broad applicability beyond electoral forecasting, offering a practical tool for demographic-driven classification tasks.

Abstract

The growth of machine learning demands interpretable models for critical applications, yet most high-performing models are ``black-box'' systems that obscure input-output relationships, while traditional rule-based algorithms like RuleFit suffer from a lack of predictive power and instability despite their simplicity. This motivated our development of Sparse Relaxed Regularized Regression Rule-Fit (SR4-Fit), a novel interpretable classification algorithm that addresses these limitations while maintaining superior classification performance. Using demographic characteristics of U.S. congressional districts from the Census Bureau's American Community Survey, we demonstrate that SR4-Fit can predict House election party outcomes with unprecedented accuracy and interpretability. Our results show that while the majority party remains the strongest predictor, SR4-Fit has revealed intrinsic combinations of demographic factors that affect prediction outcomes that were unable to be interpreted in black-box algorithms such as random forests. The SR4-Fit algorithm surpasses both black-box models and existing interpretable rule-based algorithms such as RuleFit with respect to accuracy, simplicity, and robustness, generating stable and interpretable rule sets while maintaining superior predictive performance, thus addressing the traditional trade-off between model interpretability and predictive capability in electoral forecasting. To further validate SR4-Fit's performance, we also apply it to six additional publicly available classification datasets, like the breast cancer, Ecoli, page blocks, Pima Indians, vehicle, and yeast datasets, and find similar results.

SR4-Fit: An Interpretable and Informative Classification Algorithm Applied to Prediction of U.S. House of Representatives Elections

TL;DR

and

, learned via an alternating strategy, and produces concise, human-readable rules derived from decision-tree paths. Empirically, SR4-Fit delivers high accuracy with low variance, outperforming black-box models like Random Forest and SVM and exceeding RuleFit in interpretability across electoral and standard public datasets. The work demonstrates that interpretable rule-based models can achieve competitive performance and broad applicability beyond electoral forecasting, offering a practical tool for demographic-driven classification tasks.

Abstract

Paper Structure (14 sections, 6 equations, 2 figures, 7 tables, 1 algorithm)

This paper contains 14 sections, 6 equations, 2 figures, 7 tables, 1 algorithm.

Introduction
SR4-Fit Implementation
Rule Extraction
Feature Construction and Optimization Objective
Model Pruning and Rule Selection
Classification and Probability Estimation
Experiments
Dataset Background
Experimentation
Results and Discussion
Performance on Election Forecasting Data
Interpretation SR4-Fit Rules for Election Forecasting
Result Analysis on Standard Public Datasets
Conclusions and Future Work

Figures (2)

Figure 1: Violin plot comparison of all models across different datasets for different prediction metrics.
Figure 2: Violin plot comparison of the interpretability score of all rule-based models across different demographic datasets

SR4-Fit: An Interpretable and Informative Classification Algorithm Applied to Prediction of U.S. House of Representatives Elections

TL;DR

Abstract

SR4-Fit: An Interpretable and Informative Classification Algorithm Applied to Prediction of U.S. House of Representatives Elections

Authors

TL;DR

Abstract

Table of Contents

Figures (2)