Deterministic and Stochastic Frank-Wolfe Recursion on Probability Spaces

Di Yu; Shane G. Henderson; Raghu Pasupathy

Deterministic and Stochastic Frank-Wolfe Recursion on Probability Spaces

Di Yu, Shane G. Henderson, Raghu Pasupathy

TL;DR

This work develops deterministic and stochastic Frank-Wolfe recursions for optimization over probability measures, using the influence function as the central first-order object. A key result is that the FW subproblem admits a closed-form solution as a Dirac measure at the minimizer of the influence function, enabling efficient particle-update schemes. The dFW method achieves $O(k^{-1})$ convergence under convex, $L$-smooth objectives, while sFW attains $O(k^{-1})$ in expectation and $O(k^{-1/2})$ in Frank-Wolfe gap for nonconvex objectives, with a fixed-step fixed-sample variant yielding exponential convergence; a central limit theorem is provided for observed objective values. The paper also presents a broad set of examples (calibration, optimal design, P-means, neural networks, CRE, Gaussian deconvolution) to illustrate the computation and behavior of the influence function and the proposed recursions, and discusses connections to particle-based methods and potential future extensions.

Abstract

Motivated by applications in emergency response and experimental design, we consider smooth stochastic optimization problems over probability measures supported on compact subsets of the Euclidean space. With the influence function as the variational object, we construct a deterministic Frank-Wolfe (dFW) recursion for probability spaces, made especially possible by a lemma that identifies a ``closed-form'' solution to the infinite-dimensional Frank-Wolfe sub-problem. Each iterate in dFW is expressed as a convex combination of the incumbent iterate and a Dirac measure concentrating on the minimum of the influence function at the incumbent iterate. To address common application contexts that have access only to Monte Carlo observations of the objective and influence function, we construct a stochastic Frank-Wolfe (sFW) variation that generates a random sequence of probability measures constructed using minima of increasingly accurate estimates of the influence function. We demonstrate that sFW's optimality gap sequence exhibits $O(k^{-1})$ iteration complexity almost surely and in expectation for smooth convex objectives, and $O(k^{-1/2})$ (in Frank-Wolfe gap) for smooth non-convex objectives. Furthermore, we show that an easy-to-implement fixed-step, fixed-sample version of (sFW) exhibits exponential convergence to $\varepsilon$-optimality. We end with a central limit theorem on the observed objective values at the sequence of generated random measures. To further intuition, we include several illustrative examples with exact influence function calculations.

Deterministic and Stochastic Frank-Wolfe Recursion on Probability Spaces

TL;DR

convergence under convex,

-smooth objectives, while sFW attains

in expectation and

in Frank-Wolfe gap for nonconvex objectives, with a fixed-step fixed-sample variant yielding exponential convergence; a central limit theorem is provided for observed objective values. The paper also presents a broad set of examples (calibration, optimal design, P-means, neural networks, CRE, Gaussian deconvolution) to illustrate the computation and behavior of the influence function and the proposed recursions, and discusses connections to particle-based methods and potential future extensions.

Abstract

iteration complexity almost surely and in expectation for smooth convex objectives, and

(in Frank-Wolfe gap) for smooth non-convex objectives. Furthermore, we show that an easy-to-implement fixed-step, fixed-sample version of (sFW) exhibits exponential convergence to

-optimality. We end with a central limit theorem on the observed objective values at the sequence of generated random measures. To further intuition, we include several illustrative examples with exact influence function calculations.

Paper Structure (22 sections, 12 theorems, 128 equations, 2 figures, 2 algorithms)

This paper contains 22 sections, 12 theorems, 128 equations, 2 figures, 2 algorithms.

INTRODUCTION.
Summary and Contribution
Paper Organization
LITERATURE AND PERSPECTIVE.
The Influence Function.
Why not simply "grid and optimize"?
PRELIMINARIES.
Definitions.
Basic Properties
EXAMPLES.
Calibration.
Optimal Response Time.
Optimal Experimental Design
P-means Problem
Neural Networks with a Single Hidden Layer.
...and 7 more sections

Key Result

Lemma 1

Suppose $J: \mathscr{P}(\mathscr{X}) \to \mathbb{R}$ is convex, and the von Mises derivative exists at $\mu^* \in \mathscr{P}(\mathscr{X})$ along any "direction" $\nu - \mu^*$ where $\nu \in \mathscr{P}(\mathscr{X})$. The influence function $h_{\mu^*}$ at $\mu^*$ is defined in influence. Then, $\mu^

Figures (2)

Figure 1: fcFW Results for Gaussian Deconvolution (Discrete Case, $n=1500$). (a) Comparison of the recovered density $\mu_{fcFW} * N(0,1)$ (blue) with the population density $\mu_a * N(0,1)$ (shaded). (b) The influence function at $\mu_{fcFW}$ is non-negative, verifying global optimality as stated in Lemma \ref{['lem:optCondition']}.
Figure 2: Objective values $J(\mu_k)$ of fcFW over 4000 iterations for the continuous case $\mu_b$ with $d=10$. The shaded region represents the standard deviation over 10 independent trials.

Theorems & Definitions (31)

Remark 1
Definition 1: Measure, Signed Measure, Probability Measure
Definition 2: Support
Definition 3: Influence function and von Mises Derivative
Definition 4: Gâteaux, Fréchet and Hadamard Derivatives
Definition 5: L-Smooth
Lemma 1: Conditions for Optimality
proof
Lemma 2: Support of Optimal Measure
proof
...and 21 more

Deterministic and Stochastic Frank-Wolfe Recursion on Probability Spaces

TL;DR

Abstract

Deterministic and Stochastic Frank-Wolfe Recursion on Probability Spaces

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (31)