Table of Contents
Fetching ...

DRO: A Python Library for Distributionally Robust Optimization in Machine Learning

Jiashuo Liu, Tianyu Wang, Henry Lam, Hongseok Namkoong, Jose Blanchet

TL;DR

The paper introduces dro, a comprehensive Python library for distributionally robust optimization in regression and classification, combining 14 DRO formulations with 9 backbones to enable 79 method configurations, and ensuring compatibility with scikit-learn and PyTorch. It formalizes a general DRO framework $\min_{f \in \mathcal{F}} \sup_{Q \in \mathcal{P}} \mathbb{E}_Q[\ell(f(X), Y)]$ with ambiguity sets around the empirical distribution and supports four main distance metrics, including Wasserstein, $f$-divergences, kernel, and hybrid distances. A key contribution is the modular, ML-ready software design with acceleration techniques such as vectorization and Nyström kernel approximation, delivering 10–1000× speedups on large-scale tasks while preserving optimization fidelity. The library also emphasizes personalization (RS-WDRO, Bayesian-DRO), real-data data-generation and diagnostics, and thorough software engineering practices (tests, typing, documentation, CI), enabling robust, scalable deployment of DRO in practice.

Abstract

We introduce dro, an open-source Python library for distributionally robust optimization (DRO) for regression and classification problems. The library implements 14 DRO formulations and 9 backbone models, enabling 79 distinct DRO methods. Furthermore, dro is compatible with both scikit-learn and PyTorch. Through vectorization and optimization approximation techniques, dro reduces runtime by 10x to over 1000x compared to baseline implementations on large-scale datasets. Comprehensive documentation is available at https://python-dro.org.

DRO: A Python Library for Distributionally Robust Optimization in Machine Learning

TL;DR

The paper introduces dro, a comprehensive Python library for distributionally robust optimization in regression and classification, combining 14 DRO formulations with 9 backbones to enable 79 method configurations, and ensuring compatibility with scikit-learn and PyTorch. It formalizes a general DRO framework with ambiguity sets around the empirical distribution and supports four main distance metrics, including Wasserstein, -divergences, kernel, and hybrid distances. A key contribution is the modular, ML-ready software design with acceleration techniques such as vectorization and Nyström kernel approximation, delivering 10–1000× speedups on large-scale tasks while preserving optimization fidelity. The library also emphasizes personalization (RS-WDRO, Bayesian-DRO), real-data data-generation and diagnostics, and thorough software engineering practices (tests, typing, documentation, CI), enabling robust, scalable deployment of DRO in practice.

Abstract

We introduce dro, an open-source Python library for distributionally robust optimization (DRO) for regression and classification problems. The library implements 14 DRO formulations and 9 backbone models, enabling 79 distinct DRO methods. Furthermore, dro is compatible with both scikit-learn and PyTorch. Through vectorization and optimization approximation techniques, dro reduces runtime by 10x to over 1000x compared to baseline implementations on large-scale datasets. Comprehensive documentation is available at https://python-dro.org.

Paper Structure

This paper contains 31 sections, 12 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Overview of the dro library.
  • Figure 2: Overview of the documentation website.