Table of Contents
Fetching ...

A Multilevel Method for Self-Concordant Minimization

Nick Tsipinakis, Panos Parpas

Abstract

The analysis of second-order optimization methods based either on sub-sampling, randomization or sketching has two serious shortcomings compared to the conventional Newton method. The first shortcoming is that the analysis of the iterates has only been shown to be scale-invariant only under specific assumptions on the problem structure. The second shortfall is that the fast convergence rates of second-order methods have only been established by making assumptions regarding the input data. In this paper, we propose a randomized Newton method for self-concordant functions to address both shortfalls. We propose a Self-concordant Iterative-minimization-Galerkin-based Multilevel Algorithm (SIGMA) and establish its super-linear convergence rate using the theory of self-concordant functions. Our analysis is based on the connections between multigrid optimization methods, and the role of coarse-grained or reduced-order models in the computation of search directions. We take advantage of the insights from theanalysis to significantly improve the performance of second-order methods in machine learning applications. We report encouraging initial experiments that suggest SIGMA outperforms other state-of-the-art sub-sampled/sketched Newton methods for both medium and large-scale problems.

A Multilevel Method for Self-Concordant Minimization

Abstract

The analysis of second-order optimization methods based either on sub-sampling, randomization or sketching has two serious shortcomings compared to the conventional Newton method. The first shortcoming is that the analysis of the iterates has only been shown to be scale-invariant only under specific assumptions on the problem structure. The second shortfall is that the fast convergence rates of second-order methods have only been established by making assumptions regarding the input data. In this paper, we propose a randomized Newton method for self-concordant functions to address both shortfalls. We propose a Self-concordant Iterative-minimization-Galerkin-based Multilevel Algorithm (SIGMA) and establish its super-linear convergence rate using the theory of self-concordant functions. Our analysis is based on the connections between multigrid optimization methods, and the role of coarse-grained or reduced-order models in the computation of search directions. We take advantage of the insights from theanalysis to significantly improve the performance of second-order methods in machine learning applications. We report encouraging initial experiments that suggest SIGMA outperforms other state-of-the-art sub-sampled/sketched Newton methods for both medium and large-scale problems.

Paper Structure

This paper contains 29 sections, 24 theorems, 118 equations, 7 figures, 1 table, 1 algorithm.

Key Result

lemma 1

Let $f_h : \mathbb{R}^N \rightarrow \mathbb{R}$ satisfy ass self-conc and suppose that assumption P also holds. Then, the matrix $\mathbf{Q}_H(\mathbf{x}_{h, k})$ is positive definite.

Figures (7)

  • Figure 1: From left to right, the first figure shows convergence of SIGMA for different values in $p$. The second and third figures compare the optimization methods over the $\ell_2$ and elastic-net regularized Poisson regression respectively.
  • Figure 2: Performance of various optimization methods on different datasets for the logistic regression.
  • Figure 3: Comparisons between SIGMA with Uniform, Adaptive and mixed sampling strategies.
  • Figure 4: Comparisons between SIGMA with Uniform, Adaptive and mixed sampling strategies.
  • Figure 5: Comparisons between SIGMA with Uniform, Adaptive and mixed sampling strategies.
  • ...and 2 more figures

Theorems & Definitions (53)

  • lemma 1
  • proof
  • definition 1
  • remark 1
  • lemma 2
  • lemma 3
  • remark 2
  • lemma 4
  • lemma 5
  • lemma 6
  • ...and 43 more