On the Characteristics of the Conjugate Function Enabling Effective Dual Decomposition Methods

Hansi Abeynanda; Chathuranga Weeraddana; Carlo Fischione

On the Characteristics of the Conjugate Function Enabling Effective Dual Decomposition Methods

Hansi Abeynanda, Chathuranga Weeraddana, Carlo Fischione

TL;DR

The paper introduces Fixed Gradient Over Rays (FGOR), a property of the conjugate function in convex optimization, and shows it transfers to the dual function. This structure enables a simple, prefixable stepsize rule for dual subgradient methods and a practical ray-based warm start that accelerates convergence, notably in distributed dual decomposition for global consensus. It extends FGOR to nonconvex and stochastic settings and demonstrates, through quadratic and regularized least squares experiments on real data, substantial improvements in convergence speed and communication efficiency compared with state-of-the-art splitting methods. The results highlight how exploiting conjugate-domain geometry can yield practical performance gains in large-scale distributed optimization tasks.

Abstract

We investigate a novel characteristic of the conjugate function associated to a generic convex optimization problem, which can subsequently be leveraged for efficient dual decomposition methods. In particular, under mild assumptions, we show that there is a specific region in the domain of the conjugate function such that for any point in the region, there is always a ray originating from that point along which the gradients of the conjugate remain constant. We refer to this characteristic as a fixed gradient over rays (FGOR). We further show that this characteristic is inherited by the corresponding dual function. Then we provide a thorough exposition of the application of the FGOR characteristic to dual subgradient methods. More importantly, we leverage FGOR to devise a simple stepsize rule that can be prepended with state-of-the-art stepsize methods enabling them to be more efficient. Furthermore, we investigate how the FGOR characteristic is used when solving the global consensus problem, a prevalent formulation in diverse application domains. We show that FGOR can be exploited not only to expedite the convergence of the dual decomposition methods but also to reduce the communication overhead. FGOR is extended to nonconvex formulations, and its advantages in stochastic optimization are demonstrated. Numerical experiments using quadratic objectives and a regularized least squares regression with real datasets are conducted. The results show that FGOR can significantly improve the performance of existing stepsize methods and outperform the state-of-the-art splitting methods on average in terms of both convergence behavior and communication efficiency.

On the Characteristics of the Conjugate Function Enabling Effective Dual Decomposition Methods

TL;DR

Abstract

Paper Structure (33 sections, 11 theorems, 55 equations, 10 figures, 1 table, 3 algorithms)

This paper contains 33 sections, 11 theorems, 55 equations, 10 figures, 1 table, 3 algorithms.

Introduction
Related Work
Our Contribution
Notation
Organization of the Paper
FGOR Characteristic of the Conjugate
FGOR Characteristic of the Dual Function
On the Application of FGOR on Dual Subgradient Algorithm
Formalizing the FGOR on Dual Subgradient
Determining a Ray Ensuring (18)
Algorithm
Global Consensus Problem
The Standard Dual Decomposition Method
Using FGOR Characteristic of $g$ on Algorithm \ref{['Alg:Dual-Decomposition-Algorithm']}
Fast Convergence with FGOR
...and 18 more sections

Key Result

Lemma 1

Let Then $\boldsymbol{\nu}_0\in\texttt{int}~\mathcal{V} \iff \exists~\mathbf{y}\in\texttt{int}~\mathcal{Y}~ \hbox{s.t.,}~\boldsymbol{\nu}_0\in\partial f(\mathbf{y})$.

Figures (10)

Figure 1: An illustration of Proposition \ref{['Prop:flat-region']}. \ref{['subfig:Conjugate-of-f1']}$f^*_1$ of $f_1+\delta_{\mathcal{Y}_1}$. \ref{['subfig:Conjugate-of-f2']}$f^*_2$ of $f_2+\delta_{\mathcal{Y}_2}$. \ref{['subfig:LevelSets-of-f1']} Level sets of $f^*_1$ and associated set $\mathcal{V}$: $\nabla f_1^*$ over each ray remains intact, where $\mathcal{R}(\mathbf{v}_1)$, $\mathcal{R}(\mathbf{v}_2)$, and $\mathcal{R}(\mathbf{v}_3)$ originates at $\mathbf{v}_1=[3 \ 0.5]^{\hbox{\scriptsize T}}$ extending along $\boldsymbol{\eta}_1=[0 \ 1]^{\hbox{\scriptsize T}}$, $\mathbf{v}_2=[0.3 \ -4.2]^{\hbox{\scriptsize T}}$ extending along $\boldsymbol{\eta}_2=[0 \ -1]^{\hbox{\scriptsize T}}$, and $\mathbf{v}_3=[4.5 \ -4.5]^{\hbox{\scriptsize T}}$ extending along $\boldsymbol{\eta}_3=[1 \ -1]^{\hbox{\scriptsize T}}$, respectively. \ref{['subfig:LevelSets-of-f2']} Level sets of $f^*_2$ and associated set $\mathcal{V}$: $\nabla f_2^*$ over each ray remains intact, where $\mathcal{R}(\mathbf{v}_1)$, $\mathcal{R}(\mathbf{v}_2)$, and $\mathcal{R}(\mathbf{v}_3)$ originates at $\mathbf{v}_1=[-3 \ 1.8]^{\hbox{\scriptsize T}}$ extending along $\boldsymbol{\eta}_1=[-5 \ 3]^{\hbox{\scriptsize T}}$, $\mathbf{v}_2=[3.2 \ 3.2]^{\hbox{\scriptsize T}}$ extending along $\boldsymbol{\eta}_2=[1 \ 1]^{\hbox{\scriptsize T}}$, and $\mathbf{v}_3=[0.5 \ -3.2]^{\hbox{\scriptsize T}}$ extending along $\boldsymbol{\eta}_3=[0 \ -1]^{\hbox{\scriptsize T}}$, respectively.
Figure 2: An illustration of Corollary \ref{['Prop:flat-region-dual-function']}. \ref{['subfig:example-dual-function-restrictions']} Level sets of $f^*_2$ and associated set $\mathcal{V}$: ${\mathcal{R}(\mathbf{v}_1){\subseteq}R(\mathbf{A}_1^{\hbox{\scriptsize T}})}$ and ${\mathcal{R}(\mathbf{v}_2){\subseteq} R(\mathbf{A}_2^{\hbox{\scriptsize T}})}$. \ref{['subfig:example-dual-functions']} Dual functions of problem \ref{['eq:Ex:Assumption:dual-flat-assump']} when $\mathbf{A}=\mathbf{A}_1$ and $\mathbf{A}=\mathbf{A}_2$: FGOR region of $g_1$ is $\mathcal{F}_1=\mathcal{F}_{11}\cup\mathcal{F}_{12}$, and $\mathcal{F}_1^\star$ is the region in which the dual optimal solution $\lambda_1^\star=0$ of $g_1$ resides. The points $\lambda^{(0)}=-10$, $\lambda^{(1)}=-8$, and $\lambda^{(2)}=-6$ of the subgradient method \ref{['eq:Lambda-Update']} are due to the constant stepsizes taken in the FGOR region $\mathcal{F}_{11}$.
Figure 3: An illustration of condition \ref{['eq:condition-dual-function']}: the condition is satisfied only over the ray $\{\boldsymbol{\lambda}_1+\alpha\boldsymbol{\mu}_1 \ | \ \alpha\geq 0\}$. \ref{['subfig:grad-g-parallel-to-line']} Level curves of a dual function $g~{:{\rm I\!R}^3\to{\rm I\!R}}$ and two rays with the FGOR characteristic. \ref{['subfig:grad-g-not-parallel-to-line-plane-view']} Corresponding plan view.
Figure 4: An illustration of a ray ${\mathcal{R}}^{\star}(\mathbf{\bar{y}})$ [cf. \ref{['eq:FGOR-Ray']}] associated to problem \ref{['eq:Ex:Assumption:dual-flat-assump']}, where $\mathbf{A}=[-0.05 \ 0.03]$, $\mathbf{b}=\mathbf{0}$, $\mathbf{\bar{y}}=[-2 \ 2]^\text{T}$, and $\beta=1$. The sets $N_{\mathcal{Y}_2}(\mathbf{\bar{y}})$ and $\partial f_2(\mathbf{\bar{y}})+N_{\mathcal{Y}_2}(\mathbf{\bar{y}})$ are depicted by the regions shaded in pink and green, respectively. \ref{['subfig:feasible-problem1-1']}${\mathcal{R}}^{\star}(\mathbf{\bar{y}})$ given by solving \ref{['eq:find-ray-direction-eta---']}: $\hat{\boldsymbol{\nu}}=[-3.25 \ 1.95]^\text{T}$, $\hat{\boldsymbol{\eta}}=[-0.008 \ 0.0048]^\text{T}$, $\hat{\boldsymbol{\lambda}}=65$, $\hat{\boldsymbol{\mu}}=0.16$. \ref{['subfig:feasible-problem1-2']}${\mathcal{R}}^{\star}(\mathbf{\bar{y}})$ given by solving \ref{['eq:find-ray-direction-eta-2']}: $\boldsymbol{\tilde{\eta}}=[-0.008 \ 0.0048]^\text{T}$, $\boldsymbol{\tilde{\mu}}=0.16$. Then $\hat{\boldsymbol{\nu}}=c\boldsymbol{\tilde{\eta}}$, $\hat{\boldsymbol{\eta}}=\boldsymbol{\tilde{\eta}}$, $\hat{\boldsymbol{\lambda}}=c\boldsymbol{\tilde{\mu}}$, $\hat{\boldsymbol{\mu}}=\boldsymbol{\tilde{\mu}}$, where $c=500$, cf. \ref{['eq:Solution-for-Relaxed-Problem']}.
Figure 5: Comparisons with Algorithm \ref{['Alg:Dual-Decomposition-Algorithm']}: Case 1: $\mathbf{y}^{\star}\in\texttt{int}~\mathcal{Y}$. \ref{['subfig:Case1-9']} TS: $m=4$ and $n=10$. \ref{['subfig:Case1-10']} TS: $m=100$ and $n=10$.\ref{['subfig:Case1-11']} SD and Ref. Yura_2020: $m=4$ and $n=10$. \ref{['subfig:Case1-12']} SD and Ref. Yura_2020: $m=100$ and $n=10$.
...and 5 more figures

Theorems & Definitions (34)

Lemma 1
Example 1: An Illustration of the Set $\mathcal{V}$
Remark 1
Remark 2
Proposition 1
Example 2: An Illustration of Proposition \ref{['Prop:flat-region']}
Example 3: An Illustration of Assumption \ref{['Assumption:dual-flat-assump']}
Corollary 1
Example 4: An Illustration of Corollary \ref{['Prop:flat-region-dual-function']}
Remark 3
...and 24 more

On the Characteristics of the Conjugate Function Enabling Effective Dual Decomposition Methods

TL;DR

Abstract

On the Characteristics of the Conjugate Function Enabling Effective Dual Decomposition Methods

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (34)