Fast Minimization of Expected Logarithmic Loss via Stochastic Dual Averaging
Chung-En Tsai, Hao-Chung Cheng, Yen-Huan Li
TL;DR
This work addresses convex optimization of the expected logarithmic loss over density matrices ($\mathscr{D}_d$) and the probability simplex, where the loss is non-Lipschitz and non-smooth. It introduces the $B$-sample LB-SDA method, combining a logarithmic barrier with stochastic dual averaging, and proves non-asymptotic convergence guarantees that scale as $\tilde{O}\left( \dfrac{d}{t}+\sqrt{\dfrac{d}{B t}}\right)$ for the objective, with concrete time complexities: $\tilde{O}(d^2/\varepsilon^2)$ in the classical setup and $\tilde{O}(d^3/\varepsilon^2)$ in the quantum setting (when $B=d$). The analysis hinges on a refined regret bound for logarithmic loss, a self-concordance-based smoothness Perspective, and a local-norm online-to-batch conversion, enabling tighter handling of stochastic gradients. Empirical results on Poisson inverse problems and ML quantum state tomography show that LB-SDA can outperform existing methods with explicit guarantees, demonstrating practical scalability to high dimensions and large data. These results advance scalable, non-asymptotic optimization for quantum and PSD-compliant objectives, with potential impact on quantum tomography and related PSD-relaxation problems.
Abstract
Consider the problem of minimizing an expected logarithmic loss over either the probability simplex or the set of quantum density matrices. This problem includes tasks such as solving the Poisson inverse problem, computing the maximum-likelihood estimate for quantum state tomography, and approximating positive semi-definite matrix permanents with the currently tightest approximation ratio. Although the optimization problem is convex, standard iteration complexity guarantees for first-order methods do not directly apply due to the absence of Lipschitz continuity and smoothness in the loss function. In this work, we propose a stochastic first-order algorithm named $B$-sample stochastic dual averaging with the logarithmic barrier. For the Poisson inverse problem, our algorithm attains an $\varepsilon$-optimal solution in $\smash{\tilde{O}}(d^2/\varepsilon^2)$ time, matching the state of the art, where $d$ denotes the dimension. When computing the maximum-likelihood estimate for quantum state tomography, our algorithm yields an $\varepsilon$-optimal solution in $\smash{\tilde{O}}(d^3/\varepsilon^2)$ time. This improves on the time complexities of existing stochastic first-order methods by a factor of $d^{ω-2}$ and those of batch methods by a factor of $d^2$, where $ω$ denotes the matrix multiplication exponent. Numerical experiments demonstrate that empirically, our algorithm outperforms existing methods with explicit complexity guarantees.
