Table of Contents
Fetching ...

Difference of Submodular Minimization via DC Programming

Marwa El Halabi, George Orfanides, Tim Hoheisel

TL;DR

This work studies minimizing $F(X)=G(X)-H(X)$ where $G,H$ are normalized submodular functions, reframing DS minimization as a DC program. It develops DC Algorithm (DCA) and Complete DC Algorithm (CDCA) variants, proving convergence and local-minimality guarantees, including stronger guarantees for CDCA and practical rounding-regularization adaptations. By linking DS convergence properties to DC theory, the authors achieve $O(1/k)$ objective convergence and strong local minima under CDCA, with efficient approximate solutions to concave subproblems via Frank-Wolfe. Empirically, the proposed methods outperform baselines on speech corpus selection and feature selection, demonstrating the practicality and effectiveness of the DC programming approach to DS minimization.

Abstract

Minimizing the difference of two submodular (DS) functions is a problem that naturally occurs in various machine learning problems. Although it is well known that a DS problem can be equivalently formulated as the minimization of the difference of two convex (DC) functions, existing algorithms do not fully exploit this connection. A classical algorithm for DC problems is called the DC algorithm (DCA). We introduce variants of DCA and its complete form (CDCA) that we apply to the DC program corresponding to DS minimization. We extend existing convergence properties of DCA, and connect them to convergence properties on the DS problem. Our results on DCA match the theoretical guarantees satisfied by existing DS algorithms, while providing a more complete characterization of convergence properties. In the case of CDCA, we obtain a stronger local minimality guarantee. Our numerical results show that our proposed algorithms outperform existing baselines on two applications: speech corpus selection and feature selection.

Difference of Submodular Minimization via DC Programming

TL;DR

This work studies minimizing where are normalized submodular functions, reframing DS minimization as a DC program. It develops DC Algorithm (DCA) and Complete DC Algorithm (CDCA) variants, proving convergence and local-minimality guarantees, including stronger guarantees for CDCA and practical rounding-regularization adaptations. By linking DS convergence properties to DC theory, the authors achieve objective convergence and strong local minima under CDCA, with efficient approximate solutions to concave subproblems via Frank-Wolfe. Empirically, the proposed methods outperform baselines on speech corpus selection and feature selection, demonstrating the practicality and effectiveness of the DC programming approach to DS minimization.

Abstract

Minimizing the difference of two submodular (DS) functions is a problem that naturally occurs in various machine learning problems. Although it is well known that a DS problem can be equivalently formulated as the minimization of the difference of two convex (DC) functions, existing algorithms do not fully exploit this connection. A classical algorithm for DC problems is called the DC algorithm (DCA). We introduce variants of DCA and its complete form (CDCA) that we apply to the DC program corresponding to DS minimization. We extend existing convergence properties of DCA, and connect them to convergence properties on the DS problem. Our results on DCA match the theoretical guarantees satisfied by existing DS algorithms, while providing a more complete characterization of convergence properties. In the case of CDCA, we obtain a stronger local minimality guarantee. Our numerical results show that our proposed algorithms outperform existing baselines on two applications: speech corpus selection and feature selection.
Paper Structure (41 sections, 28 theorems, 48 equations, 6 figures, 1 table, 2 algorithms)

This paper contains 41 sections, 28 theorems, 48 equations, 6 figures, 1 table, 2 algorithms.

Key Result

Proposition 2.3

For a normalized set function $F$, we have:

Figures (6)

  • Figure 1: Discrete and continuous objective values (log-scale) vs iterations on speech (top) and mushroom (bottom) datasets.
  • Figure 2: Discrete and continuous objective values (log-scale) of our proposed methods for all $\rho$ values vs iterations on speech dataset.
  • Figure 3: Discrete and continuous objective values (log-scale) of our proposed methods for all $\rho$ values vs iterations on mushroom dataset.
  • Figure 4: PGM gap values (log-scale) of our proposed methods for all $\rho$ values vs iterations on speech (top two rows) and mushroom (bottom two rows) datasets.
  • Figure 5: Discrete and continuous objective values (log-scale) vs time on speech (top) and mushroom (bottom) datasets. We include separate plots for non-DCA variants for visibility.
  • ...and 1 more figures

Theorems & Definitions (59)

  • Definition 2.1
  • Definition 2.2: Lovász extension
  • Proposition 2.3
  • Proposition 2.4
  • Lemma 2.5
  • Definition 2.6
  • Proposition 2.6
  • proof
  • Theorem 3.1
  • proof : Proof sketch
  • ...and 49 more