Table of Contents
Fetching ...

Adaptive Matrix Sparsification and Applications to Empirical Risk Minimization

Yang P. Liu, Richard Peng, Colin Tang, Albert Weng, Junzhao Yang

TL;DR

<3-5 sentence high-level summary> This work develops a nearly-linear time algorithm for empirical risk minimization (ERM) with tall, dense constraint matrices by combining a robust interior point method (IPM) with an adaptive, dynamic spectral sparsifier. The key innovation is a data-structure for maintaining leverage-score overestimates under adaptive row updates, enabling efficient sampling and construction of a spectral sparsifier that approximates the Hessian along the central path. The main result shows that, under constant-sized blocks K_i with self-concordant barriers and a well-conditioned A, ERM can be solved to high accuracy in Õ(nd + d^6√n) time (and thus nearly linear in input size when n is large). The approach blends decremental sparsification, heavy-hitter trackers, and stability analyses of the central path to achieve provable efficiency gains for ERMs in tall-dense regimes.</p>

Abstract

Consider the empirical risk minimization (ERM) problem, which is stated as follows. Let $K_1, \dots, K_m$ be compact convex sets with $K_i \subseteq \mathbb{R}^{n_i}$ for $i \in [m]$, $n = \sum_{i=1}^m n_i$, and $n_i\le C_K$ for some absolute constant $C_K$. Also, consider a matrix $A \in \mathbb{R}^{n \times d}$ and vectors $b \in \mathbb{R}^d$ and $c \in \mathbb{R}^n$. Then the ERM problem asks to find \[ \min_{\substack{x \in K_1 \times \dots \times K_m\\ A^\top x = b}} c^\top x. \] We give an algorithm to solve this to high accuracy in time $\widetilde{O}(nd + d^6\sqrt{n}) \le \widetilde{O} (nd + d^{11})$, which is nearly-linear time in the input size when $A$ is dense and $n \ge d^{10}$. Our result is achieved by implementing an $\widetilde{O}(\sqrt{n})$-iteration interior point method (IPM) efficiently using dynamic data structures. In this direction, our key technical advance is a new algorithm for maintaining leverage score overestimates of matrices undergoing row updates. Formally, given a matrix $A \in \mathbb{R}^{n \times d}$ undergoing $T$ batches of row updates of total size $n$ we give an algorithm which can maintain leverage score overestimates of the rows of $A$ summing to $\widetilde{O}(d)$ in total time $\widetilde{O}(nd + Td^6)$. This data structure is used to sample a spectral sparsifier within a robust IPM framework to establish the main result.

Adaptive Matrix Sparsification and Applications to Empirical Risk Minimization

TL;DR

<3-5 sentence high-level summary> This work develops a nearly-linear time algorithm for empirical risk minimization (ERM) with tall, dense constraint matrices by combining a robust interior point method (IPM) with an adaptive, dynamic spectral sparsifier. The key innovation is a data-structure for maintaining leverage-score overestimates under adaptive row updates, enabling efficient sampling and construction of a spectral sparsifier that approximates the Hessian along the central path. The main result shows that, under constant-sized blocks K_i with self-concordant barriers and a well-conditioned A, ERM can be solved to high accuracy in Õ(nd + d^6√n) time (and thus nearly linear in input size when n is large). The approach blends decremental sparsification, heavy-hitter trackers, and stability analyses of the central path to achieve provable efficiency gains for ERMs in tall-dense regimes.</p>

Abstract

Consider the empirical risk minimization (ERM) problem, which is stated as follows. Let be compact convex sets with for , , and for some absolute constant . Also, consider a matrix and vectors and . Then the ERM problem asks to find We give an algorithm to solve this to high accuracy in time , which is nearly-linear time in the input size when is dense and . Our result is achieved by implementing an -iteration interior point method (IPM) efficiently using dynamic data structures. In this direction, our key technical advance is a new algorithm for maintaining leverage score overestimates of matrices undergoing row updates. Formally, given a matrix undergoing batches of row updates of total size we give an algorithm which can maintain leverage score overestimates of the rows of summing to in total time . This data structure is used to sample a spectral sparsifier within a robust IPM framework to establish the main result.

Paper Structure

This paper contains 41 sections, 27 theorems, 141 equations, 1 algorithm.

Key Result

Theorem 1

There is an algorithm that takes an ERM instance as in eq:ermmain such that: outputs $x$ such that $x \in K_1 \times \dots \times K_m$, $A^\top x = b$, and in total time $\widetilde{O}(nd + d^6\sqrt{n})$, where $\widetilde{O}$ hides factors of $C_K$, as well as logs of $n$, $d$, $\kappa$, and $1/\varepsilon$.

Theorems & Definitions (58)

  • Theorem 1
  • Theorem 2
  • Lemma 2.1: Johnson–Lindenstrauss, JL84
  • Theorem 3
  • Lemma 2.2
  • Lemma 2.3
  • Definition 3.1: Self-concordance
  • Definition 4.1
  • Definition 4.2
  • Definition 4.3: Gradient
  • ...and 48 more