Table of Contents
Fetching ...

A Bregman-Kaczmarz method for nonlinear systems of equations

Robert Gower, Dirk A. Lorenz, Maximilian Winkler

TL;DR

The paper addresses solving constrained nonlinear systems $f(x)=0$ by sampling a single component per iteration and performing Bregman projections onto the local linearizations, unifying nonlinear Kaczmarz and sparse Kaczmarz under a stochastic mirror-descent viewpoint. By introducing a distance-generating function $\varphi$, the method yields NBK (exact Bregman projection) and rNBK (relaxed projection) updates, with special cases recovering the nonlinear Kaczmarz and entropy-based simplex methods. The authors prove two global convergence results: (i) under nonnegative star-convex (or affine) component functions and strong convexity of $\varphi$, descent in Bregman distance to the solution set and almost sure convergence, with sublinear rate bounds; (ii) under local tangential cone conditions, convergence with explicit rates and potential local linear convergence under favorable conditioning. Numerical experiments on sparse quadratic equations, linear systems on the probability simplex, and the left stochastic decomposition demonstrate that NBK and its relaxed variant can outperform traditional methods under memory constraints and high dimensionality, especially when appropriate Bregman distances are chosen. The work advances stochastic first-order methods for structured nonlinear problems and opens avenues for incorporating interpolation, simplex constraints, and adaptive step sizes into stochastic mirror-descent frameworks.

Abstract

We propose a new randomized method for solving systems of nonlinear equations, which can find sparse solutions or solutions under certain simple constraints. The scheme only takes gradients of component functions and uses Bregman projections onto the solution space of a Newton equation. In the special case of euclidean projections, the method is known as nonlinear Kaczmarz method. Furthermore, if the component functions are nonnegative, we are in the setting of optimization under the interpolation assumption and the method reduces to SGD with the recently proposed stochastic Polyak step size. For general Bregman projections, our method is a stochastic mirror descent with a novel adaptive step size. We prove that in the convex setting each iteration of our method results in a smaller Bregman distance to exact solutions as compared to the standard Polyak step. Our generalization to Bregman projections comes with the price that a convex one-dimensional optimization problem needs to be solved in each iteration. This can typically be done with globalized Newton iterations. Convergence is proved in two classical settings of nonlinearity: for convex nonnegative functions and locally for functions which fulfill the tangential cone condition. Finally, we show examples in which the proposed method outperforms similar methods with the same memory requirements.

A Bregman-Kaczmarz method for nonlinear systems of equations

TL;DR

The paper addresses solving constrained nonlinear systems by sampling a single component per iteration and performing Bregman projections onto the local linearizations, unifying nonlinear Kaczmarz and sparse Kaczmarz under a stochastic mirror-descent viewpoint. By introducing a distance-generating function , the method yields NBK (exact Bregman projection) and rNBK (relaxed projection) updates, with special cases recovering the nonlinear Kaczmarz and entropy-based simplex methods. The authors prove two global convergence results: (i) under nonnegative star-convex (or affine) component functions and strong convexity of , descent in Bregman distance to the solution set and almost sure convergence, with sublinear rate bounds; (ii) under local tangential cone conditions, convergence with explicit rates and potential local linear convergence under favorable conditioning. Numerical experiments on sparse quadratic equations, linear systems on the probability simplex, and the left stochastic decomposition demonstrate that NBK and its relaxed variant can outperform traditional methods under memory constraints and high dimensionality, especially when appropriate Bregman distances are chosen. The work advances stochastic first-order methods for structured nonlinear problems and opens avenues for incorporating interpolation, simplex constraints, and adaptive step sizes into stochastic mirror-descent frameworks.

Abstract

We propose a new randomized method for solving systems of nonlinear equations, which can find sparse solutions or solutions under certain simple constraints. The scheme only takes gradients of component functions and uses Bregman projections onto the solution space of a Newton equation. In the special case of euclidean projections, the method is known as nonlinear Kaczmarz method. Furthermore, if the component functions are nonnegative, we are in the setting of optimization under the interpolation assumption and the method reduces to SGD with the recently proposed stochastic Polyak step size. For general Bregman projections, our method is a stochastic mirror descent with a novel adaptive step size. We prove that in the convex setting each iteration of our method results in a smaller Bregman distance to exact solutions as compared to the standard Polyak step. Our generalization to Bregman projections comes with the price that a convex one-dimensional optimization problem needs to be solved in each iteration. This can typically be done with globalized Newton iterations. Convergence is proved in two classical settings of nonlinearity: for convex nonnegative functions and locally for functions which fulfill the tangential cone condition. Finally, we show examples in which the proposed method outperforms similar methods with the same memory requirements.
Paper Structure (16 sections, 21 theorems, 121 equations, 8 figures, 4 algorithms)

This paper contains 16 sections, 21 theorems, 121 equations, 8 figures, 4 algorithms.

Key Result

Lemma 2.2

A point $z\in E$ is the Bregman projection of $x$ onto $E$ with respect to $\varphi$ and $x^*\in\partial\varphi(x)$ if and only if there exists $z^*\in\partial \varphi(z)$ such that one of the following conditions is fulfilled:

Figures (8)

  • Figure 1: Experiment with quadratic equations, $(n,d) = (1000,500)$, $\hat{x}$ with $50$ nonzero entries, 20 random repeats. Left: plot of residual $\|f(x_k)\|_2$, right: plot of distance to solution $\hat{x}$, both over computation time. Thick line shows median over all trials, light area is between min and max, darker area indicates 25th and 75th quantile.
  • Figure 2: Experiment with quadratic equations, $(n,d) = (50,100)$, $\hat{x}$ with $5$ nonzero entries, $50$ random repeats, plot of residual $\|f(x_k)\|_2$ against computation time. Left: $\lambda=2$, right: $\lambda=5$. Thick line shows median over all trials, light area is between min and max, darker area indicates 25th and 75th quantile.
  • Figure 3: Experiment with linear equations on the probability simplex, plot of relative residuals averaged over 50 random examples against iterations ($k$) and computation time. Left column: $A\sim\mathcal{N}(0,1)^{500\times 200}$, right column: $A\sim\mathcal{N}(0,1)^{200\times 500}$. Thick line shows median over all trials, light area is between min and max, darker area indicates 25th and 75th quantile.
  • Figure 4: Experiment with linear equations on the probability simplex, plot of relative residuals averaged over 50 random examples against iterations ($k$) and computation time. Left column: $A\sim\mathcal{U}([0,1])^{200\times 500}$, right column: $A\sim\mathcal{U}([0.9,1])^{200\times 500}$. Thick line shows median over all trials, light area is between min and max, darker area indicates 25th and 75th quantile.
  • Figure 5: Experiment with linear equations on the probability simplex, plot of relative residuals averaged over 50 random examples against computation time. In both examples, $A\sim\mathcal{U}([0,1])^{200\times 500}$. Left: $\epsilon=10^{-9}$, right: $\epsilon=10^{-5}$ in NBK method. Thick line shows median over all trials, light area is between min and max, darker area indicates 25th and 75th quantile.
  • ...and 3 more figures

Theorems & Definitions (48)

  • Definition 2.1
  • Lemma 2.2: LSW14
  • Proposition 2.3
  • proof
  • Remark 3.1: Choice of $\sigma$ in Algorithm \ref{['alg:NBK']} and Algorithm \ref{['alg:NBK_relaxed']}
  • Example 3.2
  • Example 3.3
  • Example 3.4
  • Example 3.5
  • Proposition 4.1
  • ...and 38 more