Mixed-feature Logistic Regression Robust to Distribution Shifts
Qingshi Sun, Nathan Justin, Andres Gomez, Phebe Vayanos
TL;DR
This work addresses logistic regression under distribution shifts by formulating a Wasserstein-robust, mixed-feature DRO model that permits feature-wise heterogeneity in shift likelihood. It develops two scalable solution approaches—the cutting-plane method with a dynamic-programming constraint-violation oracle and a graph-based reformulation that maps constraint evaluation to longest-path problems on per-data-point DAGs—yielding large runtime gains (up to ~408x) and improved predictive reliability. Calibration under shifts is achieved via a principled parameter-tuning scheme that ties perturbation costs to domain knowledge, including explicit expressions for $\gamma_j$, $\delta_\ell$, and $\epsilon$ in terms of shift probabilities and a likelihood-ratio threshold $\theta$. Empirically, the method reduces calibration error and increases AUC (up to 36% and 18% on average, respectively, with larger improvements in worst-case metrics) on 13 UCI datasets, demonstrating practical applicability for high-stakes domains with heterogeneous distribution shifts.
Abstract
Logistic regression models are widely used in the social and behavioral sciences and in high-stakes domains, due to their simplicity and interpretability properties. At the same time, such domains are permeated by distribution shifts, where the distribution generating the data changes between training and deployment. In this paper, we study a distributionally robust logistic regression problem that seeks the model that will perform best against adversarial realizations of the data distribution drawn from a suitably constructed Wasserstein ambiguity set. Our model and solution approach differ from prior work in that we can capture settings where the likelihood of distribution shifts can vary across features, significantly broadening the applicability of our model relative to the state-of-the-art. We propose a graph-based solution approach that can be integrated into off-the-shelf optimization solvers. We evaluate the performance of our model and algorithms on numerous publicly available datasets. Our solution achieves a 408x speed-up relative to the state-of-the-art. Additionally, compared to the state-of-the-art, our model reduces average calibration error by up to 36.19% and worst-case calibration error by up to 41.70%, while increasing the average area under the ROC curve (AUC) by up to 18.02% and worst-case AUC by up to 48.37%.
