DRO: A Python Library for Distributionally Robust Optimization in Machine Learning
Jiashuo Liu, Tianyu Wang, Henry Lam, Hongseok Namkoong, Jose Blanchet
TL;DR
The paper introduces dro, a comprehensive Python library for distributionally robust optimization in regression and classification, combining 14 DRO formulations with 9 backbones to enable 79 method configurations, and ensuring compatibility with scikit-learn and PyTorch. It formalizes a general DRO framework $\min_{f \in \mathcal{F}} \sup_{Q \in \mathcal{P}} \mathbb{E}_Q[\ell(f(X), Y)]$ with ambiguity sets around the empirical distribution and supports four main distance metrics, including Wasserstein, $f$-divergences, kernel, and hybrid distances. A key contribution is the modular, ML-ready software design with acceleration techniques such as vectorization and Nyström kernel approximation, delivering 10–1000× speedups on large-scale tasks while preserving optimization fidelity. The library also emphasizes personalization (RS-WDRO, Bayesian-DRO), real-data data-generation and diagnostics, and thorough software engineering practices (tests, typing, documentation, CI), enabling robust, scalable deployment of DRO in practice.
Abstract
We introduce dro, an open-source Python library for distributionally robust optimization (DRO) for regression and classification problems. The library implements 14 DRO formulations and 9 backbone models, enabling 79 distinct DRO methods. Furthermore, dro is compatible with both scikit-learn and PyTorch. Through vectorization and optimization approximation techniques, dro reduces runtime by 10x to over 1000x compared to baseline implementations on large-scale datasets. Comprehensive documentation is available at https://python-dro.org.
