Table of Contents
Fetching ...

Stochastic Algorithms for Large-Scale Composite Optimization: the Case of Single-Shot X-FEL Imaging

D. Russell Luke, Steffen Schultze, Helmut Grubmüller

Abstract

We apply a recently developed framework for analyzing the convergence of stochastic algorithms to the general problem of large-scale nonconvex composite optimization more generally, and nonconvex likelihood maximization in particular. Our theory is demonstrated on a stochastic gradient descent algorithm for determining the electron density of a molecule from random samples of its scattering amplitude. Numerical results on an idealized synthetic example provide a proof of concept. This opens the door to a broad range of algorithmic possibilities and provides a basis for evaluating and comparing different strategies. While this case study is very specific, it shares a structure that transfers easily to many problems of current interest, particularly in machine learning.

Stochastic Algorithms for Large-Scale Composite Optimization: the Case of Single-Shot X-FEL Imaging

Abstract

We apply a recently developed framework for analyzing the convergence of stochastic algorithms to the general problem of large-scale nonconvex composite optimization more generally, and nonconvex likelihood maximization in particular. Our theory is demonstrated on a stochastic gradient descent algorithm for determining the electron density of a molecule from random samples of its scattering amplitude. Numerical results on an idealized synthetic example provide a proof of concept. This opens the door to a broad range of algorithmic possibilities and provides a basis for evaluating and comparing different strategies. While this case study is very specific, it shares a structure that transfers easily to many problems of current interest, particularly in machine learning.
Paper Structure (11 sections, 2 theorems, 61 equations, 4 figures, 1 algorithm)

This paper contains 11 sections, 2 theorems, 61 equations, 4 figures, 1 algorithm.

Key Result

Proposition 1

Let $G\subset {\mathbb{R}^n}$ be compact, let $T_i:\,G\rightarrow G\,$ be continuous for all $i\in \{1,2,\dots, M_m\}$, and let $\xi$ and $\xi_{k}$ ($k\in\mathbb{N}$) be i.i.d. random variables taking values on $\{1, 2,\dots, M_m\}$. Define the Markov transport discrepancy$\Psi:\,\mathscr{P}_2(G)\r Assume furthermore: Then for any $\mu_0\in \mathscr{P}_2(G)$ the distributions $(\mu_k)$ of the it

Figures (4)

  • Figure 1: Deterministic recovery. Algorithm \ref{['algo:rfi']} with $m=M=10,000$, $t=0.1$. (a) The true electron density. (b) The computed electron density at the last iterate using (a) as the starting point. (c) The iterate differences.
  • Figure 2: Random initialization.
  • Figure 3: Random recovery. Algorithm \ref{['algo:rfi']} with $T_i$ given by \ref{['e:SDq']} for $q=10$ and $t_j=0.1$ for all $j$. The $M=10,000$ images are sampled with $|I_i|=m$ for (a) $m=100$, (b) $m=500$, (c) $m=1000$, and (d) $m=5000$. Shown are the computed average electron densities at iteration $k=5000$. The compute times for the first $1000$ iterations ($1000$ outer iterations with $10$ inner iterations for each outer iteration) runs were: $88$ seconds for $m=100$, $107$ seconds for $m=500$, $139$ seconds for $m=1000$, and $418$ seconds for $m=5000$. In comparison, the deterministic example with $10,000$ iterations and no sampling required $494$ seconds
  • Figure 4: Convergence behavior of the mean (a) and variance (b) of the iterates. Algorithm \ref{['algo:rfi']} with $T_i$ given by \ref{['e:SDq']} for $q=10$ and $t_j=0.1$ for all $j$. The $M=10,000$ images are sampled with $|I_i|=m$ for $m=100, 500, 1000$, and $5000$.

Theorems & Definitions (6)

  • Proposition 1: convergence rates, Theorem 2.6, HerLukStu23b
  • Corollary 2: Corollary 2.7, HerLukStu23b
  • Example 1: steepest descent mappings of smooth functions are a$\alpha$-fne
  • Example 2: incorporating nonsmooth functions
  • Remark 3: modelling detector arrays
  • Remark 4: convergence of Algorithm \ref{['algo:rfi']} for X-FEL imaging