Table of Contents
Fetching ...

Two-stage stochastic algorithm for solving large-scale (non)-convex separable optimization problems under affine constraints

Benjamin Dubois-Taine, Laurent Pfeiffer, Nadia Oudjane, Adrien Seguret, Francis Bach

TL;DR

The paper tackles large-scale, separable optimization with affine coupling constraints, addressing the prohibitive cost of computing Fenchel conjugates at every iteration. It introduces a two-stage method: a stochastic dual subgradient stage to rapidly approximate the dual optimum, followed by a block-coordinate Frank-Wolfe stage to derive primal solutions from the dual information. In the convex setting, the method achieves an overall conjugate-evaluation complexity of $O\left(\frac{1}{\varepsilon^2}+\frac{N}{\varepsilon^{2/3}}\right)$, significantly beating the $O(\frac{N}{\varepsilon^2})$ baseline. The authors extend the framework to nonconvex component functions, leveraging Shapley-Folkman type bounds and Carathéodory decompositions to preserve convergence guarantees and provide practical reconstruction schemes. Numerical experiments on large-scale data, including EV charging, demonstrate substantial empirical improvements and confirm the predicted convergence behavior.

Abstract

We consider nonsmooth optimization problems under affine constraints, where the objective consists of the average of the component functions of a large number $N$ of agents, and we only assume access to the Fenchel conjugate of the component functions. The algorithm of choice for solving such problems is the dual subgradient method, also known as dual decomposition, which requires $O(\frac{1}{ε^2})$ iterations to reach $ε$-optimality in the convex case. However, each iteration requires computing the Fenchel conjugate of each of the $N$ agents, leading to a complexity $O(\frac{N}{ε^2})$ which might be prohibitive in practical applications. To overcome this, we propose a two-stage algorithm, combining a stochastic subgradient algorithm on the dual problem, followed by a block-coordinate Frank-Wolfe algorithm to obtain primal solutions. The resulting algorithm requires only $O(\frac{1}{ε^2} + \frac{N}{ε^{2/3}})$ calls to Fenchel conjugates to obtain an $ε$-optimal primal solution in expectation in the convex case. We extend our results to nonconvex component functions and show that our method still applies and gets (almost) the same convergence rate, this time only to an approximate primal solution recovering the classical duality gap bounds usually obtained using the Shapley-Folkman theorem.

Two-stage stochastic algorithm for solving large-scale (non)-convex separable optimization problems under affine constraints

TL;DR

The paper tackles large-scale, separable optimization with affine coupling constraints, addressing the prohibitive cost of computing Fenchel conjugates at every iteration. It introduces a two-stage method: a stochastic dual subgradient stage to rapidly approximate the dual optimum, followed by a block-coordinate Frank-Wolfe stage to derive primal solutions from the dual information. In the convex setting, the method achieves an overall conjugate-evaluation complexity of , significantly beating the baseline. The authors extend the framework to nonconvex component functions, leveraging Shapley-Folkman type bounds and Carathéodory decompositions to preserve convergence guarantees and provide practical reconstruction schemes. Numerical experiments on large-scale data, including EV charging, demonstrate substantial empirical improvements and confirm the predicted convergence behavior.

Abstract

We consider nonsmooth optimization problems under affine constraints, where the objective consists of the average of the component functions of a large number of agents, and we only assume access to the Fenchel conjugate of the component functions. The algorithm of choice for solving such problems is the dual subgradient method, also known as dual decomposition, which requires iterations to reach -optimality in the convex case. However, each iteration requires computing the Fenchel conjugate of each of the agents, leading to a complexity which might be prohibitive in practical applications. To overcome this, we propose a two-stage algorithm, combining a stochastic subgradient algorithm on the dual problem, followed by a block-coordinate Frank-Wolfe algorithm to obtain primal solutions. The resulting algorithm requires only calls to Fenchel conjugates to obtain an -optimal primal solution in expectation in the convex case. We extend our results to nonconvex component functions and show that our method still applies and gets (almost) the same convergence rate, this time only to an approximate primal solution recovering the classical duality gap bounds usually obtained using the Shapley-Folkman theorem.
Paper Structure (36 sections, 29 theorems, 195 equations, 2 figures, 5 algorithms)

This paper contains 36 sections, 29 theorems, 195 equations, 2 figures, 5 algorithms.

Key Result

Proposition 2.1

Suppose that for all $i=1, \dots, N$, there exists some $x_i \in X_i$ such that $\frac{1}{N} \sum_{i=1}^N A_i x_i < b$. Then Assumption ass:existence-dual-maximizer holds.

Figures (2)

  • Figure 1: Convergence of the deterministic subgradient algorithm convergence (red) and of our two-stage algorithm (stochastic dual subgradient in blue, BCFW in green). Top left: convergence of the dual objective. Top right: same as top left in log-log scale. Middle left: bidual gap. Some curves are missing as the bidual iterates have a strictly negative bidual gap. This is due to the violation of the coupling constraint. Middle right: violation of the coupling constraint by the bidual iterates. Bottom left: sum of the bidual gap and of the coupling constraint violation. We see that our two-stage algorithm significantly outperforms the deterministic dual subgradient algorithm. Bottom right: same as bottom left in log-log scale.
  • Figure 2: Left. Performance of our two-stage algorithm in the bidual (blue-green circles) and primal (black crosses) compared with the performance of the dual subgradient in the bidual (red circles) and in the primal (red crosses). Right. Same plot in logarithmic x-scale with the additional orange curve being the inverse of the square root of the number of oracle calls. We see that (i) the solution to the nonconvex primal problem is almost as good as the solution to the bidual convex problem, as is expected from the duality gap bounds derived in \ref{['thm:nonconvex-2-stage-convergence-nonconvex-domains']} and (ii) the convergence is of the order of the inverse of the square root of the number of oracle calls, as is expected from Theorems \ref{['thm:convergence-2-stage-algorithm']} and \ref{['thm:nonconvex-2-stage-convergence-nonconvex-domains']}.

Theorems & Definitions (35)

  • Proposition 2.1
  • definition thmcounterdefinition
  • definition thmcounterdefinition
  • Proposition 2.2
  • Proposition 2.3: Bounded iterates
  • Proposition 2.4
  • Proposition 2.5
  • remark thmcounterremark
  • Proposition 3.1
  • Proposition 3.2
  • ...and 25 more