Learning from Conditional Distributions via Dual Embeddings
Bo Dai, Niao He, Yunpeng Pan, Byron Boots, Le Song
TL;DR
The paper tackles learning from conditional distributions p(z|x) under severe sample limitations by reframing the problem via Fenchel duality into a saddle-point form that operates on the joint distribution p(x,z,y). It introduces Embedding-SGD, a kernel-based, sample-efficient algorithm that jointly optimizes a primal function and a dual function within RKHSs, achieving a theoretical O(1/ε^2) sample complexity and enabling one-sample-at-a-time updates. The framework unifies and improves approaches to learning with invariance and policy evaluation in reinforcement learning, and extends to stochastic-process predictions such as hitting times. Empirical results on invariance learning and policy evaluation demonstrate superior performance and robustness, validating the method’s practical impact and its potential for broad applicability and extensions (random features, neural-dual structures).
Abstract
Many machine learning tasks, such as learning with invariance and policy evaluation in reinforcement learning, can be characterized as problems of learning from conditional distributions. In such problems, each sample $x$ itself is associated with a conditional distribution $p(z|x)$ represented by samples $\{z_i\}_{i=1}^M$, and the goal is to learn a function $f$ that links these conditional distributions to target values $y$. These learning problems become very challenging when we only have limited samples or in the extreme case only one sample from each conditional distribution. Commonly used approaches either assume that $z$ is independent of $x$, or require an overwhelmingly large samples from each conditional distribution. To address these challenges, we propose a novel approach which employs a new min-max reformulation of the learning from conditional distribution problem. With such new reformulation, we only need to deal with the joint distribution $p(z,x)$. We also design an efficient learning algorithm, Embedding-SGD, and establish theoretical sample complexity for such problems. Finally, our numerical experiments on both synthetic and real-world datasets show that the proposed approach can significantly improve over the existing algorithms.
