Table of Contents
Fetching ...

Random Subspace Cubic-Regularization Methods, with Applications to Low-Rank Functions

Coralia Cartis, Zhen Shao, Edward Tansley

TL;DR

This work develops Random Subspace ARC (R-ARC), a second-order optimization framework that operates in randomly sampled low-dimensional subspaces to scale cubic-regularization methods to high-dimensional problems. By projecting gradients and Hessians via a sketching matrix and solving a reduced cubic model, R-ARC preserves the optimal first-order convergence rate $O(\epsilon^{-3/2})$ with high probability, and extends the analysis to second-order criticality in both subspace and full-space settings. The paper further introduces R-ARC-D, an adaptive variant that automatically adjusts the sketch size to the Hessian’s effective rank, enabling substantial dimensionality reduction for low-rank functions and retaining the same convergence guarantees. Numerical experiments on full-rank and low-rank CUTEst problems demonstrate favorable performance of R-ARC and R-ARC-D relative to full-space ARC, particularly when the problem exhibits low effective dimensionality or when computational budgets favor reduced inner problem costs. Collectively, these results support the practical viability of random subspace cubic-regularization as a scalable, theoretically sound approach for high-dimensional, second-order optimization problems.

Abstract

We propose and analyze random subspace variants of the second-order Adaptive Regularization using Cubics (ARC) algorithm. These methods iteratively restrict the search space to some random subspace of the parameters, constructing and minimizing a local model only within this subspace. Thus, our variants only require access to (small-dimensional) projections of first- and second-order problem derivatives and calculate a reduced step inexpensively. Under suitable assumptions, the ensuing methods maintain the optimal first-order, and second-order, global rates of convergence of (full-dimensional) cubic regularization, while showing improved scalability both theoretically and numerically, particularly when applied to low-rank functions. When applied to the latter, our adaptive variant naturally adapts the subspace size to the true rank of the function, without knowing it a priori.

Random Subspace Cubic-Regularization Methods, with Applications to Low-Rank Functions

TL;DR

This work develops Random Subspace ARC (R-ARC), a second-order optimization framework that operates in randomly sampled low-dimensional subspaces to scale cubic-regularization methods to high-dimensional problems. By projecting gradients and Hessians via a sketching matrix and solving a reduced cubic model, R-ARC preserves the optimal first-order convergence rate with high probability, and extends the analysis to second-order criticality in both subspace and full-space settings. The paper further introduces R-ARC-D, an adaptive variant that automatically adjusts the sketch size to the Hessian’s effective rank, enabling substantial dimensionality reduction for low-rank functions and retaining the same convergence guarantees. Numerical experiments on full-rank and low-rank CUTEst problems demonstrate favorable performance of R-ARC and R-ARC-D relative to full-space ARC, particularly when the problem exhibits low effective dimensionality or when computational budgets favor reduced inner problem costs. Collectively, these results support the practical viability of random subspace cubic-regularization as a scalable, theoretically sound approach for high-dimensional, second-order optimization problems.

Abstract

We propose and analyze random subspace variants of the second-order Adaptive Regularization using Cubics (ARC) algorithm. These methods iteratively restrict the search space to some random subspace of the parameters, constructing and minimizing a local model only within this subspace. Thus, our variants only require access to (small-dimensional) projections of first- and second-order problem derivatives and calculate a reduced step inexpensively. Under suitable assumptions, the ensuing methods maintain the optimal first-order, and second-order, global rates of convergence of (full-dimensional) cubic regularization, while showing improved scalability both theoretically and numerically, particularly when applied to low-rank functions. When applied to the latter, our adaptive variant naturally adapts the subspace size to the true rank of the function, without knowing it a priori.
Paper Structure (64 sections, 38 theorems, 102 equations, 22 figures, 2 tables, 3 algorithms)

This paper contains 64 sections, 38 theorems, 102 equations, 22 figures, 2 tables, 3 algorithms.

Key Result

Lemma 1.1

Given a fixed, finite set $Y \subset \mathbb{R}^{n}, \epsilon, \delta > 0$, let $S \in \mathbb{R}^{l \times d}$ be a scaled Gaussian matrix, with $l = \mathcal{O}(\epsilon^{-2}\log(\frac{|Y|}{\delta}))$ and where $|Y|$ refers to the cardinality of $Y$. Then we have, with probability at least $1 - \d

Figures (22)

  • Figure 1: Varying the sketch dimension using scaled Gaussian matrices, comparing R-ARC vs ARC; plotting budget and time; full-rank problems
  • Figure 2: Varying the sketch dimension using scaled Gaussian matrices, comparing R-ARC-D vs ARC; plotting budget and time; full-rank problems
  • Figure 3: Varying the sketch dimension using scaled Gaussian matrices, comparing R-ARC-D vs R-ARC vs ARC; plotting budget and time; full-rank problems
  • Figure 4: Comparing Gaussian matrices with scaled Haar and scaled sampling matrices in R-ARC-D; plotting budget and time for the best $l_0$ for each matrix type; full-rank problems
  • Figure 5: Varying the sketch dimension using scaled Gaussian matrices in R-ARC; plotting budget and time; low-rank problems
  • ...and 17 more figures

Theorems & Definitions (80)

  • Definition 1.1: $\epsilon$-subspace embedding 10.1561/0400000060
  • Definition 1.2: Oblivious embedding 10.1561/040000006010.1109/FOCS.2006.37
  • Definition 1.3
  • Lemma 1.1: JL Lemma Johnson:1984aaMR1943859
  • Definition 1.4
  • Lemma 2.1
  • proof
  • Theorem 2.1
  • Remark 2.1
  • Definition 4.1
  • ...and 70 more