Table of Contents
Fetching ...

Global and local approaches for the minimization of a sum of pointwise minima of convex functions

Guillaume Van Dessel, François Glineur

TL;DR

This work tackles the nonconvex, nonsmooth problem of minimizing a sum of pointwise minima of convex functions (SMC), an extension of clipped convex formulations that is NP-hard in general. It introduces three reformulations—global MICP, local MICP, and global BIC—to exploit convex substructures and provide rigorous but scalable approaches. The authors develop relaxed alternating minimization (r-AM), a family of local-search methods that interpolate between AM and exploration-based updates, and prove that accumulation points are critical for SM C, with practical convergence guarantees. Empirical results on piecewise-linear regression and restricted facility location show that r-AM variants frequently outperform standard AM and DC-based baselines, while local optimality certificates via neighborhood MICP can certify or improve solutions. The paper also demonstrates how perspective and big-M formulations can be combined with problem-specific knowledge to enable local optimality certifications and discusses future directions, including smoothing to enable second-order methods.

Abstract

Numerous machine learning and industrial problems can be modeled as the minimization of a sum of $N$ so-called clipped convex functions (SCC), i.e. each term of the sum stems as the pointwise minimum between a constant and a convex function. In this work, we extend this framework to capture more problems of interest. Specifically, we allow each term of the sum to be a pointwise minimum of an arbitrary number of convex functions, called components, turning the objective into a sum of pointwise minima of convex functions (SMC). Problem (SCC) is NP-hard, highlighting an appeal for scalable local heuristics. In this spirit, one can express (SMC) objectives as the difference between two convex functions to leverage the possibility to apply (DC) algorithms to compute critical points of the problem. Our approach relies on a bi-convex reformulation of the problem. From there, we derive a family of local methods, dubbed as relaxed alternating minimization (r-AM) methods, that include classical alternating minimization (AM) as a special case. We prove that every accumulation point of r-AM is critical. In addition, we show the empirical superiority of r-AM, compared to traditional AM and (DC) approaches, on piecewise-linear regression and restricted facility location problems. Under mild assumptions, (SCC) can be cast as a mixed-integer convex program (MICP) using perspective functions. This approach can be generalized to (SMC) but introduces many copies of the primal variable. In contrast, we suggest a compact big-M based (MICP) equivalent formulation of (SMC), free of these extra variables. Finally, we showcase practical examples where solving our (MICP), restricted to a neighbourhood of a given candidate (i.e. output iterate of a local method), will either certify the candidate's optimality on that neighbourhood or providing a new point, strictly better, to restart the local method.

Global and local approaches for the minimization of a sum of pointwise minima of convex functions

TL;DR

This work tackles the nonconvex, nonsmooth problem of minimizing a sum of pointwise minima of convex functions (SMC), an extension of clipped convex formulations that is NP-hard in general. It introduces three reformulations—global MICP, local MICP, and global BIC—to exploit convex substructures and provide rigorous but scalable approaches. The authors develop relaxed alternating minimization (r-AM), a family of local-search methods that interpolate between AM and exploration-based updates, and prove that accumulation points are critical for SM C, with practical convergence guarantees. Empirical results on piecewise-linear regression and restricted facility location show that r-AM variants frequently outperform standard AM and DC-based baselines, while local optimality certificates via neighborhood MICP can certify or improve solutions. The paper also demonstrates how perspective and big-M formulations can be combined with problem-specific knowledge to enable local optimality certifications and discusses future directions, including smoothing to enable second-order methods.

Abstract

Numerous machine learning and industrial problems can be modeled as the minimization of a sum of so-called clipped convex functions (SCC), i.e. each term of the sum stems as the pointwise minimum between a constant and a convex function. In this work, we extend this framework to capture more problems of interest. Specifically, we allow each term of the sum to be a pointwise minimum of an arbitrary number of convex functions, called components, turning the objective into a sum of pointwise minima of convex functions (SMC). Problem (SCC) is NP-hard, highlighting an appeal for scalable local heuristics. In this spirit, one can express (SMC) objectives as the difference between two convex functions to leverage the possibility to apply (DC) algorithms to compute critical points of the problem. Our approach relies on a bi-convex reformulation of the problem. From there, we derive a family of local methods, dubbed as relaxed alternating minimization (r-AM) methods, that include classical alternating minimization (AM) as a special case. We prove that every accumulation point of r-AM is critical. In addition, we show the empirical superiority of r-AM, compared to traditional AM and (DC) approaches, on piecewise-linear regression and restricted facility location problems. Under mild assumptions, (SCC) can be cast as a mixed-integer convex program (MICP) using perspective functions. This approach can be generalized to (SMC) but introduces many copies of the primal variable. In contrast, we suggest a compact big-M based (MICP) equivalent formulation of (SMC), free of these extra variables. Finally, we showcase practical examples where solving our (MICP), restricted to a neighbourhood of a given candidate (i.e. output iterate of a local method), will either certify the candidate's optimality on that neighbourhood or providing a new point, strictly better, to restart the local method.

Paper Structure

This paper contains 22 sections, 9 theorems, 92 equations, 9 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1.1

Let $N \in \mathbb{N}$ be fixed and the number of components $n_s\in \mathbb{N}$ be chosen for every $s \in[N]$. For any $d \geq N$, let $\bar{h} : \mathbb{R}^d \to \mathbb{R}$ be a proper convex function. The function $\mathcal{W}$ defined for every $x \in \mathbb{R}^d$ by is fully-active. I.e., for every $\sigma \in \bigtimes_{s=1}^{N}\, [n_s]$, $\sigma$ is the unique selection leading to

Figures (9)

  • Figure 1: $F(x_1,x_2) = \min\{(x_1-3)^2+\frac{1}{3}(x_2+3)^2,(x_1+3)^2+\frac{1}{6}x_2^2,15\}+\min\{(x_2-2 x_1 + 1)^2,|x_1+2|\}$
  • Figure 2: value functions | $C=0 \rightarrow$ sum of maximums, $C=1 \rightarrow$ sum of minimums
  • Figure 3: A first glimpse of relaxed Alternating Minimization (r-AM) methods
  • Figure 4: Piecewise-Linear Regression: (time-value) and (value distribution) plots
  • Figure 5: Restricted Facility Location: (value distribution) $\Lambda =0$ (left) and $\Lambda=10$ (right)
  • ...and 4 more figures

Theorems & Definitions (32)

  • Remark 1
  • Proposition 1.1: Fully-active \ref{['eq:min_problem']} instance
  • Remark 2
  • Remark 3
  • Example 1.2
  • Definition 1.3: Critical point
  • Definition 1.4: Local optimality
  • Definition 1.5: $\rho$-active sets
  • Proposition 1.6
  • Definition 1.7: Standard simplex
  • ...and 22 more