Optimal Regularization Under Uncertainty: Distributional Robustness and Convexity Constraints
Oscar Leong, Eliza O'Reilly, Yong Sheng Soh
TL;DR
The paper develops a distributionally robust optimization framework for regularizers, treating the regularizer as a gauge of a star body $K$ with normalization $\mathrm{vol}(K)=1$ and studying $\min_K\{\max_{d_W(Q,P)\leq \epsilon} \mathbb{E}_Q[\|\mathbf{x}\|_K]\}$. It shows a convex-duality reformulation that eliminates the inner maximization, analyzes how the robustness parameter $\epsilon$ and the Wasserstein cost shape the regularizer (including a Lipschitz penalty $\epsilon\,\mathrm{Lip}(K)$ in the Wasserstein-1 case), and proves existence of minimizers for $\epsilon>0$. The authors also address enforcing convexity of the optimal regularizer, providing finite-dimensional convex programs in $\mathbb{R}^2$ and several numerical examples that connect distributional shifts to regularizer geometry. Extensions to critic-based regularizers and alternative proofs illustrate the framework’s flexibility and potential for robust deployment across inverse problems. Overall, the work offers both theoretical foundations and practical computational tools for designing regularizers that are reliable under model uncertainty and convexity constraints.
Abstract
Regularization is a central tool for addressing ill-posedness in inverse problems and statistical estimation, with the choice of a suitable penalty often determining the reliability and interpretability of downstream solutions. While recent work has characterized optimal regularizers for well-specified data distributions, practical deployments are often complicated by distributional uncertainty and the need to enforce structural constraints such as convexity. In this paper, we introduce a framework for distributionally robust optimal regularization, which identifies regularizers that remain effective under perturbations of the data distribution. Our approach leverages convex duality to reformulate the underlying distributionally robust optimization problem, eliminating the inner maximization and yielding formulations that are amenable to numerical computation. We show how the resulting robust regularizers interpolate between memorization of the training distribution and uniform priors, providing insights into their behavior as robustness parameters vary. For example, we show how certain ambiguity sets, such as those based on the Wasserstein-1 distance, naturally induce regularity in the optimal regularizer by promoting regularizers with smaller Lipschitz constants. We further investigate the setting where regularizers are required to be convex, formulating a convex program for their computation and illustrating their stability with respect to distributional shifts. Taken together, our results provide both theoretical and computational foundations for designing regularizers that are reliable under model uncertainty and structurally constrained for robust deployment.
