Fast Minimization of Expected Logarithmic Loss via Stochastic Dual Averaging

Chung-En Tsai; Hao-Chung Cheng; Yen-Huan Li

Fast Minimization of Expected Logarithmic Loss via Stochastic Dual Averaging

Chung-En Tsai, Hao-Chung Cheng, Yen-Huan Li

TL;DR

This work addresses convex optimization of the expected logarithmic loss over density matrices ($\mathscr{D}_d$) and the probability simplex, where the loss is non-Lipschitz and non-smooth. It introduces the $B$-sample LB-SDA method, combining a logarithmic barrier with stochastic dual averaging, and proves non-asymptotic convergence guarantees that scale as $\tilde{O}\left( \dfrac{d}{t}+\sqrt{\dfrac{d}{B t}}\right)$ for the objective, with concrete time complexities: $\tilde{O}(d^2/\varepsilon^2)$ in the classical setup and $\tilde{O}(d^3/\varepsilon^2)$ in the quantum setting (when $B=d$). The analysis hinges on a refined regret bound for logarithmic loss, a self-concordance-based smoothness Perspective, and a local-norm online-to-batch conversion, enabling tighter handling of stochastic gradients. Empirical results on Poisson inverse problems and ML quantum state tomography show that LB-SDA can outperform existing methods with explicit guarantees, demonstrating practical scalability to high dimensions and large data. These results advance scalable, non-asymptotic optimization for quantum and PSD-compliant objectives, with potential impact on quantum tomography and related PSD-relaxation problems.

Abstract

Consider the problem of minimizing an expected logarithmic loss over either the probability simplex or the set of quantum density matrices. This problem includes tasks such as solving the Poisson inverse problem, computing the maximum-likelihood estimate for quantum state tomography, and approximating positive semi-definite matrix permanents with the currently tightest approximation ratio. Although the optimization problem is convex, standard iteration complexity guarantees for first-order methods do not directly apply due to the absence of Lipschitz continuity and smoothness in the loss function. In this work, we propose a stochastic first-order algorithm named $B$-sample stochastic dual averaging with the logarithmic barrier. For the Poisson inverse problem, our algorithm attains an $\varepsilon$-optimal solution in $\smash{\tilde{O}}(d^2/\varepsilon^2)$ time, matching the state of the art, where $d$ denotes the dimension. When computing the maximum-likelihood estimate for quantum state tomography, our algorithm yields an $\varepsilon$-optimal solution in $\smash{\tilde{O}}(d^3/\varepsilon^2)$ time. This improves on the time complexities of existing stochastic first-order methods by a factor of $d^{ω-2}$ and those of batch methods by a factor of $d^2$, where $ω$ denotes the matrix multiplication exponent. Numerical experiments demonstrate that empirically, our algorithm outperforms existing methods with explicit complexity guarantees.

Fast Minimization of Expected Logarithmic Loss via Stochastic Dual Averaging

TL;DR

This work addresses convex optimization of the expected logarithmic loss over density matrices (

) and the probability simplex, where the loss is non-Lipschitz and non-smooth. It introduces the

-sample LB-SDA method, combining a logarithmic barrier with stochastic dual averaging, and proves non-asymptotic convergence guarantees that scale as

for the objective, with concrete time complexities:

in the classical setup and

in the quantum setting (when

). The analysis hinges on a refined regret bound for logarithmic loss, a self-concordance-based smoothness Perspective, and a local-norm online-to-batch conversion, enabling tighter handling of stochastic gradients. Empirical results on Poisson inverse problems and ML quantum state tomography show that LB-SDA can outperform existing methods with explicit guarantees, demonstrating practical scalability to high dimensions and large data. These results advance scalable, non-asymptotic optimization for quantum and PSD-compliant objectives, with potential impact on quantum tomography and related PSD-relaxation problems.

Abstract

-sample stochastic dual averaging with the logarithmic barrier. For the Poisson inverse problem, our algorithm attains an

-optimal solution in

time, matching the state of the art, where

denotes the dimension. When computing the maximum-likelihood estimate for quantum state tomography, our algorithm yields an

-optimal solution in

time. This improves on the time complexities of existing stochastic first-order methods by a factor of

and those of batch methods by a factor of

, where

denotes the matrix multiplication exponent. Numerical experiments demonstrate that empirically, our algorithm outperforms existing methods with explicit complexity guarantees.

Paper Structure (32 sections, 16 theorems, 69 equations, 4 figures, 1 table, 3 algorithms)

This paper contains 32 sections, 16 theorems, 69 equations, 4 figures, 1 table, 3 algorithms.

Introduction
Contributions
Technical Breakthroughs
Notations
Related Work
Applications
Kelly's Criterion
Poisson Inverse Problem
ML Quantum State Tomography
PSD Matrix Permanents
Characterizations of Logarithmic Loss
"Lipschitz Continuity"
"Smoothness"
Algorithms and Convergence Guarantees
Algorithm
...and 17 more sections

Key Result

Lemma 1

For $\rho\in\mathbb{H}^d_{++}$ and $X\in\mathbb{H}^d$, the local norm and its dual norm associated with $h$ are given by

Figures (4)

Figure 1: Performances of all algorithms in Table \ref{['tab:time_complexity_comparison']}, SPDHG, and EMD with line search for solving the Poisson inverse problem.
Figure 2: Performances of all algorithms in Table \ref{['tab:time_complexity_comparison']}, iMLE, diluted iMLE, and EMD with line search for computing the ML estimate for quantum state tomography.
Figure 3: Performances of all algorithms in Table \ref{['tab:time_complexity_comparison']}, SPDHG, and EMD with line search for solving 20 randomly generated Poisson inverse problem instances. For each algorithm, the solid line represents its average error, and the shaded region indicates the 95% confidence interval.
Figure 4: Performances of all algorithms in Table \ref{['tab:time_complexity_comparison']}, iMLE, diluted iMLE, and EMD with line search for computing the ML estimate for quantum state tomography.

Theorems & Definitions (23)

Lemma 1
Lemma 2
Lemma 3
Lemma 4
Lemma 5
Theorem 6
Corollary 7
Remark 8
Definition 9: Self-concordance
Theorem 10: Theorem 5.1.5 of Nesterov2018a
...and 13 more

Fast Minimization of Expected Logarithmic Loss via Stochastic Dual Averaging

TL;DR

Abstract

Fast Minimization of Expected Logarithmic Loss via Stochastic Dual Averaging

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (23)