Knowledge-Guided Wasserstein Distributionally Robust Optimization
Zitao Wang, Ziyuan Wang, Molei Liu, Nian Si
TL;DR
Knowledge-Guided Wasserstein Distributionally Robust Optimization (KG-WDRO) addresses the conservativeness of standard WDRO in transfer learning by guiding transport with prior knowledge through knowledge-informed directions. The authors establish a theoretical equivalence between KG-WDRO and shrinkage-based transfer estimators under collinear similarity, and provide tractable dual reformulations that support strong-/weak-transfer and multi-source extensions, including Mahalanobis-type metrics. The framework spans linear regression and binary classification, offering strong results under various loss functions and cost relaxations, and demonstrates superior performance in small-sample, multi-site, and high-dimensional settings. This work unifies several transfer-learning strategies within a distributionally robust perspective and provides practical mechanisms to adjust scaling and cross-source differences, with potential for data-driven hyperparameter tuning.
Abstract
Transfer learning is a popular strategy to leverage external knowledge and improve statistical efficiency, particularly with a limited target sample. We propose a novel knowledge-guided Wasserstein Distributionally Robust Optimization (KG-WDRO) framework that adaptively incorporates multiple sources of external knowledge to overcome the conservativeness of vanilla WDRO, which often results in overly pessimistic shrinkage toward zero. Our method constructs smaller Wasserstein ambiguity sets by controlling the transportation along directions informed by the source knowledge. This strategy can alleviate perturbations on the predictive projection of the covariates and protect against information loss. Theoretically, we establish the equivalence between our WDRO formulation and the knowledge-guided shrinkage estimation based on collinear similarity, ensuring tractability and geometrizing the feasible set. This also reveals a novel and general interpretation for recent shrinkage-based transfer learning approaches from the perspective of distributional robustness. In addition, our framework can adjust for scaling differences in the regression models between the source and target and accommodates general types of regularization such as lasso and ridge. Extensive simulations demonstrate the superior performance and adaptivity of KG-WDRO in enhancing small-sample transfer learning.
