Two-stage stochastic algorithm for solving large-scale (non)-convex separable optimization problems under affine constraints

Benjamin Dubois-Taine; Laurent Pfeiffer; Nadia Oudjane; Adrien Seguret; Francis Bach

Two-stage stochastic algorithm for solving large-scale (non)-convex separable optimization problems under affine constraints

Benjamin Dubois-Taine, Laurent Pfeiffer, Nadia Oudjane, Adrien Seguret, Francis Bach

TL;DR

The paper tackles large-scale, separable optimization with affine coupling constraints, addressing the prohibitive cost of computing Fenchel conjugates at every iteration. It introduces a two-stage method: a stochastic dual subgradient stage to rapidly approximate the dual optimum, followed by a block-coordinate Frank-Wolfe stage to derive primal solutions from the dual information. In the convex setting, the method achieves an overall conjugate-evaluation complexity of $O\left(\frac{1}{\varepsilon^2}+\frac{N}{\varepsilon^{2/3}}\right)$, significantly beating the $O(\frac{N}{\varepsilon^2})$ baseline. The authors extend the framework to nonconvex component functions, leveraging Shapley-Folkman type bounds and Carathéodory decompositions to preserve convergence guarantees and provide practical reconstruction schemes. Numerical experiments on large-scale data, including EV charging, demonstrate substantial empirical improvements and confirm the predicted convergence behavior.

Abstract

We consider nonsmooth optimization problems under affine constraints, where the objective consists of the average of the component functions of a large number $N$ of agents, and we only assume access to the Fenchel conjugate of the component functions. The algorithm of choice for solving such problems is the dual subgradient method, also known as dual decomposition, which requires $O(\frac{1}{ε^2})$ iterations to reach $ε$-optimality in the convex case. However, each iteration requires computing the Fenchel conjugate of each of the $N$ agents, leading to a complexity $O(\frac{N}{ε^2})$ which might be prohibitive in practical applications. To overcome this, we propose a two-stage algorithm, combining a stochastic subgradient algorithm on the dual problem, followed by a block-coordinate Frank-Wolfe algorithm to obtain primal solutions. The resulting algorithm requires only $O(\frac{1}{ε^2} + \frac{N}{ε^{2/3}})$ calls to Fenchel conjugates to obtain an $ε$-optimal primal solution in expectation in the convex case. We extend our results to nonconvex component functions and show that our method still applies and gets (almost) the same convergence rate, this time only to an approximate primal solution recovering the classical duality gap bounds usually obtained using the Shapley-Folkman theorem.

Two-stage stochastic algorithm for solving large-scale (non)-convex separable optimization problems under affine constraints

TL;DR

, significantly beating the

baseline. The authors extend the framework to nonconvex component functions, leveraging Shapley-Folkman type bounds and Carathéodory decompositions to preserve convergence guarantees and provide practical reconstruction schemes. Numerical experiments on large-scale data, including EV charging, demonstrate substantial empirical improvements and confirm the predicted convergence behavior.

Abstract

We consider nonsmooth optimization problems under affine constraints, where the objective consists of the average of the component functions of a large number

of agents, and we only assume access to the Fenchel conjugate of the component functions. The algorithm of choice for solving such problems is the dual subgradient method, also known as dual decomposition, which requires

iterations to reach

-optimality in the convex case. However, each iteration requires computing the Fenchel conjugate of each of the

agents, leading to a complexity

which might be prohibitive in practical applications. To overcome this, we propose a two-stage algorithm, combining a stochastic subgradient algorithm on the dual problem, followed by a block-coordinate Frank-Wolfe algorithm to obtain primal solutions. The resulting algorithm requires only

calls to Fenchel conjugates to obtain an

-optimal primal solution in expectation in the convex case. We extend our results to nonconvex component functions and show that our method still applies and gets (almost) the same convergence rate, this time only to an approximate primal solution recovering the classical duality gap bounds usually obtained using the Shapley-Folkman theorem.

Paper Structure (36 sections, 29 theorems, 195 equations, 2 figures, 5 algorithms)

This paper contains 36 sections, 29 theorems, 195 equations, 2 figures, 5 algorithms.

Introduction
Background and related work
Convex case
Nonconvex case
Roadmap of paper and contributions
Preliminaries
Dual
Dual subgradient
Dual convergence
Primal convergence
Total complexity
The two-stage algorithm
Stage 1: stochastic dual subgradient
Dual convergence
Primal convergence
...and 21 more sections

Key Result

Proposition 2.1

Suppose that for all $i=1, \dots, N$, there exists some $x_i \in X_i$ such that $\frac{1}{N} \sum_{i=1}^N A_i x_i < b$. Then Assumption ass:existence-dual-maximizer holds.

Figures (2)

Figure 1: Convergence of the deterministic subgradient algorithm convergence (red) and of our two-stage algorithm (stochastic dual subgradient in blue, BCFW in green). Top left: convergence of the dual objective. Top right: same as top left in log-log scale. Middle left: bidual gap. Some curves are missing as the bidual iterates have a strictly negative bidual gap. This is due to the violation of the coupling constraint. Middle right: violation of the coupling constraint by the bidual iterates. Bottom left: sum of the bidual gap and of the coupling constraint violation. We see that our two-stage algorithm significantly outperforms the deterministic dual subgradient algorithm. Bottom right: same as bottom left in log-log scale.
Figure 2: Left. Performance of our two-stage algorithm in the bidual (blue-green circles) and primal (black crosses) compared with the performance of the dual subgradient in the bidual (red circles) and in the primal (red crosses). Right. Same plot in logarithmic x-scale with the additional orange curve being the inverse of the square root of the number of oracle calls. We see that (i) the solution to the nonconvex primal problem is almost as good as the solution to the bidual convex problem, as is expected from the duality gap bounds derived in \ref{['thm:nonconvex-2-stage-convergence-nonconvex-domains']} and (ii) the convergence is of the order of the inverse of the square root of the number of oracle calls, as is expected from Theorems \ref{['thm:convergence-2-stage-algorithm']} and \ref{['thm:nonconvex-2-stage-convergence-nonconvex-domains']}.

Theorems & Definitions (35)

Proposition 2.1
definition thmcounterdefinition
definition thmcounterdefinition
Proposition 2.2
Proposition 2.3: Bounded iterates
Proposition 2.4
Proposition 2.5
remark thmcounterremark
Proposition 3.1
Proposition 3.2
...and 25 more

Two-stage stochastic algorithm for solving large-scale (non)-convex separable optimization problems under affine constraints

TL;DR

Abstract

Two-stage stochastic algorithm for solving large-scale (non)-convex separable optimization problems under affine constraints

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (35)