SR4-Fit: An Interpretable and Informative Classification Algorithm Applied to Prediction of U.S. House of Representatives Elections
Shyam Sundar Murali Krishnan, Dean Frederick Hougen
TL;DR
SR4-Fit addresses the interpretability–accuracy trade-off in classification by uniting RuleFit-style rule extraction with Sparse Relaxed Regularized Regression. It optimizes a logistic-loss objective with sparsity and a coupling term between $β$ and $w$, learned via an alternating strategy, and produces concise, human-readable rules derived from decision-tree paths. Empirically, SR4-Fit delivers high accuracy with low variance, outperforming black-box models like Random Forest and SVM and exceeding RuleFit in interpretability across electoral and standard public datasets. The work demonstrates that interpretable rule-based models can achieve competitive performance and broad applicability beyond electoral forecasting, offering a practical tool for demographic-driven classification tasks.
Abstract
The growth of machine learning demands interpretable models for critical applications, yet most high-performing models are ``black-box'' systems that obscure input-output relationships, while traditional rule-based algorithms like RuleFit suffer from a lack of predictive power and instability despite their simplicity. This motivated our development of Sparse Relaxed Regularized Regression Rule-Fit (SR4-Fit), a novel interpretable classification algorithm that addresses these limitations while maintaining superior classification performance. Using demographic characteristics of U.S. congressional districts from the Census Bureau's American Community Survey, we demonstrate that SR4-Fit can predict House election party outcomes with unprecedented accuracy and interpretability. Our results show that while the majority party remains the strongest predictor, SR4-Fit has revealed intrinsic combinations of demographic factors that affect prediction outcomes that were unable to be interpreted in black-box algorithms such as random forests. The SR4-Fit algorithm surpasses both black-box models and existing interpretable rule-based algorithms such as RuleFit with respect to accuracy, simplicity, and robustness, generating stable and interpretable rule sets while maintaining superior predictive performance, thus addressing the traditional trade-off between model interpretability and predictive capability in electoral forecasting. To further validate SR4-Fit's performance, we also apply it to six additional publicly available classification datasets, like the breast cancer, Ecoli, page blocks, Pima Indians, vehicle, and yeast datasets, and find similar results.
