Table of Contents
Fetching ...

Beyond Convexity: Proximal-Perturbed Lagrangian Methods for Efficient Functional Constrained Optimization

Sang Bin Moon, Jong Gwang Kim, Ashish Chandra, Christopher Brinton, Abolfazl Hashemi

TL;DR

This paper develops a primal-dual algorithmic framework built upon a novel form of the Lagrangian function, termed the {\em Proximal-Perturbed Augmented Lagrangian}, which enables the development of simple first-order algorithms that converge to a stationary solution under mild conditions.

Abstract

Non-convex functional constrained optimization problems have gained substantial attention in machine learning and data science, addressing broad requirements that typically go beyond the often performance-centric objectives. An influential class of algorithms for functional constrained problems is the class of primal-dual methods which has been extensively analyzed for convex problems. Nonetheless, the investigation of their efficacy for non-convex problems is under-explored. This paper develops a primal-dual algorithmic framework for solving such non-convex problems. This framework is built upon a novel form of the Lagrangian function, termed the {\em Proximal-Perturbed Augmented Lagrangian}, which enables the development of simple first-order algorithms that converge to a stationary solution under mild conditions. Notably, we study this framework under both non-smoothness and smoothness of the constraint function and provide three key contributions: (i) a simple algorithm that does not require the continuous adjustment of the penalty parameter; (ii) a non-asymptotic iteration complexity of $\widetilde{\mathcal{O}}(1/ε^2)$; and (iii) extensive experimental results demonstrating the effectiveness of the proposed framework in terms of computational cost and performance, outperforming related approaches that use regularization (penalization) techniques and/or standard Lagrangian relaxation across diverse non-convex problems.

Beyond Convexity: Proximal-Perturbed Lagrangian Methods for Efficient Functional Constrained Optimization

TL;DR

This paper develops a primal-dual algorithmic framework built upon a novel form of the Lagrangian function, termed the {\em Proximal-Perturbed Augmented Lagrangian}, which enables the development of simple first-order algorithms that converge to a stationary solution under mild conditions.

Abstract

Non-convex functional constrained optimization problems have gained substantial attention in machine learning and data science, addressing broad requirements that typically go beyond the often performance-centric objectives. An influential class of algorithms for functional constrained problems is the class of primal-dual methods which has been extensively analyzed for convex problems. Nonetheless, the investigation of their efficacy for non-convex problems is under-explored. This paper develops a primal-dual algorithmic framework for solving such non-convex problems. This framework is built upon a novel form of the Lagrangian function, termed the {\em Proximal-Perturbed Augmented Lagrangian}, which enables the development of simple first-order algorithms that converge to a stationary solution under mild conditions. Notably, we study this framework under both non-smoothness and smoothness of the constraint function and provide three key contributions: (i) a simple algorithm that does not require the continuous adjustment of the penalty parameter; (ii) a non-asymptotic iteration complexity of ; and (iii) extensive experimental results demonstrating the effectiveness of the proposed framework in terms of computational cost and performance, outperforming related approaches that use regularization (penalization) techniques and/or standard Lagrangian relaxation across diverse non-convex problems.

Paper Structure

This paper contains 45 sections, 26 theorems, 146 equations, 8 figures, 2 tables, 2 algorithms.

Key Result

Lemma 11

Let $\left\{\mathbf{w}_{k} \right\}$ be the sequence generated by Algorithm alg:plada, and let $\{\mathbf{p}_k := (\mathbf{x}_k,\mathbf{u}_{k},\mathbf{z}_{k})\}$ be the primal sequence. Under Assumptions assumption_kkt, assumption_lipschitz_f, assumption_bounded_domain and assumption_bounded_subgrad where $\boldsymbol{\zeta}_{\mathbf{p}}^{k+1} :=(\boldsymbol{\zeta}_{\mathbf{x}}^{k+1},\boldsymbol{\

Figures (8)

  • Figure 1: Comparison of the performance of PLADA, IPP-ConEx, IPP-SSG and SSG on the logistic loss equation \ref{['eq:logistic_loss']} with demographic parity (DP) constraint equation \ref{['eq:demographic_parity']}. The results are presented in terms of their loss values, constraint violation and near stationarity (from top to bottom) on Adult, Bank and COMPAS datasets (from left to right) with respect to CPU time in seconds.
  • Figure 2: Comparison of the performance of PLADA, IPP-ConEx, IPP-SSG and SSG on the logistic loss objective (\ref{['eq:logistic_loss']}) and the equalized odds (EO) constraint (\ref{['eq:equalized_odds']}) with respect to CPU time.
  • Figure 3: Comparison of the validation performance of PLADA and Narasimhan et al., narasimhan2020approximate on the intersectional group fairness equation \ref{['eq:intersectional_fairness']} versus Epochs.
  • Figure 4: Performance comparison of PPALA and GDPA on Fashion-MNIST and CIFAR10 datasets in terms of obtaining stationarity and feasibility. We see that PPALA provides a consistent reduction of stationarity and feasibility gaps that align with our theoretical expectations. In contrast, GDPA reduces the feasibility gap at a slower rate on Fashion-MNIST and CIFAR10 in our neural network setting.
  • Figure 5: Comparison of the performance of PLADA with different $\alpha$ on the logistic loss objective with demographic parity (DP) constraint on Adult dataset. The results show the performance of PLADA is not sensitive to the value of $\alpha$ ($\beta=0.1$ is fixed).
  • ...and 3 more figures

Theorems & Definitions (58)

  • Definition 1: The KKT point
  • Definition 2: $\epsilon$-KKT solution, Definition 2 of lu2022single
  • Definition 9: Definition 8.3 of rockafellar2009variational
  • Definition 10
  • Lemma 11: Primal Stationarity
  • Remark 12
  • Lemma 13: Primal Feasibility
  • Lemma 14: Dual feasibility
  • Lemma 15: Complementary slackness
  • Theorem 16: Convergence to a KKT Point
  • ...and 48 more