Random Subspace Cubic-Regularization Methods, with Applications to Low-Rank Functions

Coralia Cartis; Zhen Shao; Edward Tansley

Random Subspace Cubic-Regularization Methods, with Applications to Low-Rank Functions

Coralia Cartis, Zhen Shao, Edward Tansley

TL;DR

This work develops Random Subspace ARC (R-ARC), a second-order optimization framework that operates in randomly sampled low-dimensional subspaces to scale cubic-regularization methods to high-dimensional problems. By projecting gradients and Hessians via a sketching matrix and solving a reduced cubic model, R-ARC preserves the optimal first-order convergence rate $O(\epsilon^{-3/2})$ with high probability, and extends the analysis to second-order criticality in both subspace and full-space settings. The paper further introduces R-ARC-D, an adaptive variant that automatically adjusts the sketch size to the Hessian’s effective rank, enabling substantial dimensionality reduction for low-rank functions and retaining the same convergence guarantees. Numerical experiments on full-rank and low-rank CUTEst problems demonstrate favorable performance of R-ARC and R-ARC-D relative to full-space ARC, particularly when the problem exhibits low effective dimensionality or when computational budgets favor reduced inner problem costs. Collectively, these results support the practical viability of random subspace cubic-regularization as a scalable, theoretically sound approach for high-dimensional, second-order optimization problems.

Abstract

We propose and analyze random subspace variants of the second-order Adaptive Regularization using Cubics (ARC) algorithm. These methods iteratively restrict the search space to some random subspace of the parameters, constructing and minimizing a local model only within this subspace. Thus, our variants only require access to (small-dimensional) projections of first- and second-order problem derivatives and calculate a reduced step inexpensively. Under suitable assumptions, the ensuing methods maintain the optimal first-order, and second-order, global rates of convergence of (full-dimensional) cubic regularization, while showing improved scalability both theoretically and numerically, particularly when applied to low-rank functions. When applied to the latter, our adaptive variant naturally adapts the subspace size to the true rank of the function, without knowing it a priori.

Random Subspace Cubic-Regularization Methods, with Applications to Low-Rank Functions

TL;DR

with high probability, and extends the analysis to second-order criticality in both subspace and full-space settings. The paper further introduces R-ARC-D, an adaptive variant that automatically adjusts the sketch size to the Hessian’s effective rank, enabling substantial dimensionality reduction for low-rank functions and retaining the same convergence guarantees. Numerical experiments on full-rank and low-rank CUTEst problems demonstrate favorable performance of R-ARC and R-ARC-D relative to full-space ARC, particularly when the problem exhibits low effective dimensionality or when computational budgets favor reduced inner problem costs. Collectively, these results support the practical viability of random subspace cubic-regularization as a scalable, theoretically sound approach for high-dimensional, second-order optimization problems.

Abstract

Paper Structure (64 sections, 38 theorems, 102 equations, 22 figures, 2 tables, 3 algorithms)

This paper contains 64 sections, 38 theorems, 102 equations, 22 figures, 2 tables, 3 algorithms.

Introduction
Dimensionality reduction in the parameter space
Random subspace variants of ARC
Existing literature
Random embeddings
Low-rank functions
Summary of contributions
Structure of the paper
A Generic Algorithmic Framework and its Analysis (Summary)
A probabilistic convergence result
A Random Subspace Cubically-Regularized Algorithm
Optimal Global Rate of R-ARC to First Order Critical Points
Define $N_{\epsilon}$ and true iterations based on (one-sided) subspace embedding
Some useful lemmas
Satisfying the assumptions of \ref{['thm2']}
...and 49 more sections

Key Result

Lemma 1.1

Given a fixed, finite set $Y \subset \mathbb{R}^{n}, \epsilon, \delta > 0$, let $S \in \mathbb{R}^{l \times d}$ be a scaled Gaussian matrix, with $l = \mathcal{O}(\epsilon^{-2}\log(\frac{|Y|}{\delta}))$ and where $|Y|$ refers to the cardinality of $Y$. Then we have, with probability at least $1 - \d

Figures (22)

Figure 1: Varying the sketch dimension using scaled Gaussian matrices, comparing R-ARC vs ARC; plotting budget and time; full-rank problems
Figure 2: Varying the sketch dimension using scaled Gaussian matrices, comparing R-ARC-D vs ARC; plotting budget and time; full-rank problems
Figure 3: Varying the sketch dimension using scaled Gaussian matrices, comparing R-ARC-D vs R-ARC vs ARC; plotting budget and time; full-rank problems
Figure 4: Comparing Gaussian matrices with scaled Haar and scaled sampling matrices in R-ARC-D; plotting budget and time for the best $l_0$ for each matrix type; full-rank problems
Figure 5: Varying the sketch dimension using scaled Gaussian matrices in R-ARC; plotting budget and time; low-rank problems
...and 17 more figures

Theorems & Definitions (80)

Definition 1.1: $\epsilon$-subspace embedding 10.1561/0400000060
Definition 1.2: Oblivious embedding 10.1561/040000006010.1109/FOCS.2006.37
Definition 1.3
Lemma 1.1: JL Lemma Johnson:1984aaMR1943859
Definition 1.4
Lemma 2.1
proof
Theorem 2.1
Remark 2.1
Definition 4.1
...and 70 more

Random Subspace Cubic-Regularization Methods, with Applications to Low-Rank Functions

TL;DR

Abstract

Random Subspace Cubic-Regularization Methods, with Applications to Low-Rank Functions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (22)

Theorems & Definitions (80)