Table of Contents
Fetching ...

Hessian-Free Distributed Bilevel Optimization via Penalization with Time-Scale Separation

Youcheng Niu, Jinming Xu, Ying Sun, Li Chai, Jiming Chen

TL;DR

A loopless distributed algorithm is proposed, AHEAD, that employs multiple-timescale updates to solve the DBO problem asymptotically without requiring Hessian computation and reveals a clear dependence of convergence performance on node heterogeneity, penalty parameters, and network connectivity.

Abstract

This paper considers a class of distributed bilevel optimization (DBO) problems with a coupled inner-level subproblem. Existing approaches typically rely on hypergradient estimations involving computationally expensive Hessian evaluation. To address this, we approximate the DBO problem as a minimax problem by properly designing a penalty term that enforces both the constraint imposed by the inner-level subproblem and the consensus among the decision variables of agents. Moreover, we propose a loopless distributed algorithm, AHEAD, that employs multiple-timescale updates to solve the approximate problem asymptotically without requiring Hessian computation. Theoretically, we establish sharp convergence rates for nonconvex-strongly-convex settings and for distributed minimax problems as special cases. Our analysis reveals a clear dependence of convergence performance on node heterogeneity, penalty parameters, and network connectivity, with a weaker assumption on heterogeneity that only requires bounded gradients at the optimum. Numerical experiments corroborate our theoretical results.

Hessian-Free Distributed Bilevel Optimization via Penalization with Time-Scale Separation

TL;DR

A loopless distributed algorithm is proposed, AHEAD, that employs multiple-timescale updates to solve the DBO problem asymptotically without requiring Hessian computation and reveals a clear dependence of convergence performance on node heterogeneity, penalty parameters, and network connectivity.

Abstract

This paper considers a class of distributed bilevel optimization (DBO) problems with a coupled inner-level subproblem. Existing approaches typically rely on hypergradient estimations involving computationally expensive Hessian evaluation. To address this, we approximate the DBO problem as a minimax problem by properly designing a penalty term that enforces both the constraint imposed by the inner-level subproblem and the consensus among the decision variables of agents. Moreover, we propose a loopless distributed algorithm, AHEAD, that employs multiple-timescale updates to solve the approximate problem asymptotically without requiring Hessian computation. Theoretically, we establish sharp convergence rates for nonconvex-strongly-convex settings and for distributed minimax problems as special cases. Our analysis reveals a clear dependence of convergence performance on node heterogeneity, penalty parameters, and network connectivity, with a weaker assumption on heterogeneity that only requires bounded gradients at the optimum. Numerical experiments corroborate our theoretical results.

Paper Structure

This paper contains 21 sections, 11 theorems, 57 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Consider the sequence $\{x_i^k, y_i^k, z_i^k\}$ generated by Algorithm alg:1. Suppose Assumptions ASS-outer-level-ASS-heterogeneity hold. If the penalty parameter $\lambda$ satisfies $\lambda>\frac{2L_{f,1}}{\mu_g}$ and the step sizes $\alpha$, $\beta$, $\gamma$ respectively satisfy then, for any total number of iterations $K$, we have where $\mathcal{C}(\cdot)$ and $\mathcal{B}(\cdot)$ represen

Figures (4)

  • Figure 1: The performance of the proposed AHEAD algorithm: (a) The loss $f,g$ and constraint tolerance with respect to iterations; (b) The optimal gap and consensus errors with respect to the number of iterations; (c) and (d): The trajectories of the individual state $(x_i^k, y_i^k)$ and the average state $(\bar{x}^k, \bar{y}^k)$ within the contours of the outer- and inner-level objectives, respectively.
  • Figure 2: The performance of the proposed AHEAD algorithm under networks with different connectivity levels, where high, medium, and low connectivity correspond to spectral gaps of $\rho = 0.274$, $\rho = 0.644$, and $\rho = 0.923$, respectively.
  • Figure 3: The performance of the proposed AHEAD algorithm under different node heterogeneity: (a) Data distribution; (b) Testing accuracy.
  • Figure 4: Performance comparison of MA-DSBO, SLDBO, and the proposed AHEAD algorithm: (a) Testing accuracy; (b) Training loss.

Theorems & Definitions (30)

  • Remark 1: Hessian-free property
  • Definition 1: $\epsilon$-stationary point
  • Definition 2: Node heterogeneity
  • Remark 2: Weaker assumption on heterogeneity
  • Remark 3: Loopless structure
  • Remark 4: Time-scale separation
  • Theorem 1
  • proof
  • Remark 5: Effect of the penalty parameter
  • Corollary 1
  • ...and 20 more