Interpretability-by-Design with Accurate Locally Additive Models and Conditional Feature Effects
Vasilis Gkolemis, Loukas Kavouras, Dimitrios Kyriakopoulos, Konstantinos Tsopelas, Dimitrios Rontogiannis, Giuseppe Casalicchio, Theodore Dalamagas, Christos Diou
TL;DR
CALM introduces Conditionally Additive Local Models to bridge GAM interpretability and GA2M accuracy by learning region-specific univariate shape functions per feature conditioned on interacting features. A three-step distillation pipeline—reference model, heterogeneity-driven region partitioning, and region-aware backfitting—identifies homogeneous regions where features act additively, while still capturing interactions. The authors prove population-level optimality for Step 3 under fixed regions and formalize interpretability properties, including local contributions, regional sensitivity, and global monotonicity. Empirically, CALM outperforms GAM baselines and matches or exceeds GA2Ms across 25 real-world datasets with far fewer interactions and competitive runtimes, demonstrating a favorable accuracy–interpretability trade-off for practical use.
Abstract
Generalized additive models (GAMs) offer interpretability through independent univariate feature effects but underfit when interactions are present in data. GA$^2$Ms add selected pairwise interactions which improves accuracy, but sacrifices interpretability and limits model auditing. We propose \emph{Conditionally Additive Local Models} (CALMs), a new model class, that balances the interpretability of GAMs with the accuracy of GA$^2$Ms. CALMs allow multiple univariate shape functions per feature, each active in different regions of the input space. These regions are defined independently for each feature as simple logical conditions (thresholds) on the features it interacts with. As a result, effects remain locally additive while varying across subregions to capture interactions. We further propose a principled distillation-based training pipeline that identifies homogeneous regions with limited interactions and fits interpretable shape functions via region-aware backfitting. Experiments on diverse classification and regression tasks show that CALMs consistently outperform GAMs and achieve accuracy comparable with GA$^2$Ms. Overall, CALMs offer a compelling trade-off between predictive accuracy and interpretability.
