Table of Contents
Fetching ...

Unlocking TriLevel Learning with Level-Wise Zeroth Order Constraints: Distributed Algorithms and Provable Non-Asymptotic Convergence

Yang Jiao, Kai Yang, Chengtao Jian

TL;DR

This work tackles distributed trilevel optimization with level-wise zeroth-order constraints, addressing the absence of gradient information in practical, privacy-preserving settings. It introduces DTZO, a gradient-free framework that builds cascaded zeroth-order polynomial approximations through zeroth-order cuts and a consensus-based distributed algorithm, accompanied by non-asymptotic convergence guarantees to an $ε$-stationary point. Theoretical results quantify iteration and communication complexities and reveal a tunable trade-off via a cascade-refinement horizon parameter $T_1$. Empirically, DTZO demonstrates superior performance on black-box trilevel learning with LLMs and on robust hyperparameter optimization tasks, validating effectiveness, scalability, and robustness to smoothing choices.

Abstract

Trilevel learning (TLL) found diverse applications in numerous machine learning applications, ranging from robust hyperparameter optimization to domain adaptation. However, existing researches primarily focus on scenarios where TLL can be addressed with first order information available at each level, which is inadequate in many situations involving zeroth order constraints, such as when black-box models are employed. Moreover, in trilevel learning, data may be distributed across various nodes, necessitating strategies to address TLL problems without centralizing data on servers to uphold data privacy. To this end, an effective distributed trilevel zeroth order learning framework DTZO is proposed in this work to address the TLL problems with level-wise zeroth order constraints in a distributed manner. The proposed DTZO is versatile and can be adapted to a wide range of (grey-box) TLL problems with partial zeroth order constraints. In DTZO, the cascaded polynomial approximation can be constructed without relying on gradients or sub-gradients, leveraging a novel cut, i.e., zeroth order cut. Furthermore, we theoretically carry out the non-asymptotic convergence rate analysis for the proposed DTZO in achieving the $ε$-stationary point. Extensive experiments have been conducted to demonstrate and validate the superior performance of the proposed DTZO, e.g., it approximately achieves up to a 40$\%$ improvement in performance.

Unlocking TriLevel Learning with Level-Wise Zeroth Order Constraints: Distributed Algorithms and Provable Non-Asymptotic Convergence

TL;DR

This work tackles distributed trilevel optimization with level-wise zeroth-order constraints, addressing the absence of gradient information in practical, privacy-preserving settings. It introduces DTZO, a gradient-free framework that builds cascaded zeroth-order polynomial approximations through zeroth-order cuts and a consensus-based distributed algorithm, accompanied by non-asymptotic convergence guarantees to an -stationary point. Theoretical results quantify iteration and communication complexities and reveal a tunable trade-off via a cascade-refinement horizon parameter . Empirically, DTZO demonstrates superior performance on black-box trilevel learning with LLMs and on robust hyperparameter optimization tasks, validating effectiveness, scalability, and robustness to smoothing choices.

Abstract

Trilevel learning (TLL) found diverse applications in numerous machine learning applications, ranging from robust hyperparameter optimization to domain adaptation. However, existing researches primarily focus on scenarios where TLL can be addressed with first order information available at each level, which is inadequate in many situations involving zeroth order constraints, such as when black-box models are employed. Moreover, in trilevel learning, data may be distributed across various nodes, necessitating strategies to address TLL problems without centralizing data on servers to uphold data privacy. To this end, an effective distributed trilevel zeroth order learning framework DTZO is proposed in this work to address the TLL problems with level-wise zeroth order constraints in a distributed manner. The proposed DTZO is versatile and can be adapted to a wide range of (grey-box) TLL problems with partial zeroth order constraints. In DTZO, the cascaded polynomial approximation can be constructed without relying on gradients or sub-gradients, leveraging a novel cut, i.e., zeroth order cut. Furthermore, we theoretically carry out the non-asymptotic convergence rate analysis for the proposed DTZO in achieving the -stationary point. Extensive experiments have been conducted to demonstrate and validate the superior performance of the proposed DTZO, e.g., it approximately achieves up to a 40 improvement in performance.

Paper Structure

This paper contains 39 sections, 4 theorems, 126 equations, 7 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

The original feasible region of constraint $\phi_{\rm{in}}( \{{\boldsymbol{x}_{3,j}}\}, \boldsymbol{z}_1, {\boldsymbol{z}_2}', \boldsymbol{z}_3) = 0$ is a subset of the feasible region formed by inner layer zeroth order cuts, i.e., $P_{\rm{in}}^{t+1} = \left\{ h_l^{\rm{in}}( \{{\boldsymbol{x}_{3,j}}

Figures (7)

  • Figure 1: Comparisons about ASR and ACC between the proposed DTZO and the state-of-the-art distributed bilevel zeroth order learning method FedRZO$_{\rm{bl}}$qiu2023zeroth.
  • Figure 2: Adjusting $T_1$ can flexibly control the trade-off between performance and complexity, results on USPS dataset.
  • Figure 3: Training time (1000 communication rounds) of with and without removing inactive cuts.
  • Figure 4: Test loss of the proposed DTZO under various setting of $T_1$, results on USPS dataset.
  • Figure 5: Test loss on AS (adversarial samples) of DTZO under various setting of $T_1$.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Proposition 1
  • Proposition 2
  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • Theorem 2