Table of Contents
Fetching ...

A resource-efficient model for deep kernel learning

Luisa D'Amore

TL;DR

The paper tackles the resource-intensive nature of deep kernel learning by introducing D$^3$L, a model-level decomposition that combines operator and network partitioning to enable parallel, scalable training. It formulates a Concatenated Tikhonov Regularization ($\mathcal{CTR}$) framework for Deep Kernel Learning (DKL) and proves that a global solution can be assembled from local TR problems via domain decomposition, leading to improved accuracy-per-parameter while reducing communication overhead. The approach is analyzed in terms of scalability, presenting a scale-up metric and time complexities, and is validated on a real HPC cluster using the DIGITS dataset, demonstrating strong and weak scaling. The work lays out a path for resource-efficient DL, with potential extensions to nonconvex losses and physics-informed neural networks (PINNs).

Abstract

According to the Hughes phenomenon, the major challenges encountered in computations with learning models comes from the scale of complexity, e.g. the so-called curse of dimensionality. There are various approaches for accelerate learning computations with minimal loss of accuracy. These approaches range from model-level to implementation-level approaches. To the best of our knowledge, the first one is rarely used in its basic form. Perhaps, this is due to theoretical understanding of mathematical insights of model decomposition approaches, and thus the ability of developing mathematical improvements has lagged behind. We describe a model-level decomposition approach that combines both the decomposition of the operators and the decomposition of the network. We perform a feasibility analysis on the resulting algorithm, both in terms of its accuracy and scalability.

A resource-efficient model for deep kernel learning

TL;DR

The paper tackles the resource-intensive nature of deep kernel learning by introducing DL, a model-level decomposition that combines operator and network partitioning to enable parallel, scalable training. It formulates a Concatenated Tikhonov Regularization () framework for Deep Kernel Learning (DKL) and proves that a global solution can be assembled from local TR problems via domain decomposition, leading to improved accuracy-per-parameter while reducing communication overhead. The approach is analyzed in terms of scalability, presenting a scale-up metric and time complexities, and is validated on a real HPC cluster using the DIGITS dataset, demonstrating strong and weak scaling. The work lays out a path for resource-efficient DL, with potential extensions to nonconvex losses and physics-informed neural networks (PINNs).

Abstract

According to the Hughes phenomenon, the major challenges encountered in computations with learning models comes from the scale of complexity, e.g. the so-called curse of dimensionality. There are various approaches for accelerate learning computations with minimal loss of accuracy. These approaches range from model-level to implementation-level approaches. To the best of our knowledge, the first one is rarely used in its basic form. Perhaps, this is due to theoretical understanding of mathematical insights of model decomposition approaches, and thus the ability of developing mathematical improvements has lagged behind. We describe a model-level decomposition approach that combines both the decomposition of the operators and the decomposition of the network. We perform a feasibility analysis on the resulting algorithm, both in terms of its accuracy and scalability.

Paper Structure

This paper contains 13 sections, 7 theorems, 67 equations, 2 tables, 2 algorithms.

Key Result

Theorem 7

The function is the unique minimizer of the Hilbert space norm in $H$ under all functions $f \in H$ such that The coefficients $\alpha_k$ can be calculated form the linear system where $A_{ij}= K(x_i,x_j)$, $\alpha= (\alpha_1, \ldots \alpha_N)^T$, $\mathbf{y}=(y_1, \ldots,y_N)^T$ and $\mathbf{A} \in \Re^{N \times N}$.

Theorems & Definitions (25)

  • Definition 1: ML - Problem I
  • Definition 2: Data Estimation
  • Definition 3: Artificial neuron
  • Definition 4: DL - Problem I
  • Definition 5: RKHS
  • Definition 6: ML in RKHS - Problem II
  • Theorem 7: Representer Theorem
  • Definition 8: ML in RKHS - an inverse problem
  • Definition 9: RML - Regularized ML Problem III
  • Definition 10: RDL - Regularized DL Problems
  • ...and 15 more