Spectrally Transformed Kernel Regression

Runtian Zhai; Rattana Pukdee; Roger Jin; Maria-Florina Balcan; Pradeep Ravikumar

Spectrally Transformed Kernel Regression

Runtian Zhai, Rattana Pukdee, Roger Jin, Maria-Florina Balcan, Pradeep Ravikumar

TL;DR

This work revisits the classical idea of spectrally transformed kernel regression (STKR), and provides a new class of general and scalable STKR estimators able to leverage unlabeled data.

Abstract

Unlabeled data is a key component of modern machine learning. In general, the role of unlabeled data is to impose a form of smoothness, usually from the similarity information encoded in a base kernel, such as the $ε$-neighbor kernel or the adjacency matrix of a graph. This work revisits the classical idea of spectrally transformed kernel regression (STKR), and provides a new class of general and scalable STKR estimators able to leverage unlabeled data. Intuitively, via spectral transformation, STKR exploits the data distribution for which unlabeled data can provide additional information. First, we show that STKR is a principled and general approach, by characterizing a universal type of "target smoothness", and proving that any sufficiently smooth function can be learned by STKR. Second, we provide scalable STKR implementations for the inductive setting and a general transformation function, while prior work is mostly limited to the transductive setting. Third, we derive statistical guarantees for two scenarios: STKR with a known polynomial transformation, and STKR with kernel PCA when the transformation is unknown. Overall, we believe that this work helps deepen our understanding of how to work with unlabeled data, and its generality makes it easier to inspire new methods.

Spectrally Transformed Kernel Regression

TL;DR

This work revisits the classical idea of spectrally transformed kernel regression (STKR), and provides a new class of general and scalable STKR estimators able to leverage unlabeled data.

Abstract

-neighbor kernel or the adjacency matrix of a graph. This work revisits the classical idea of spectrally transformed kernel regression (STKR), and provides a new class of general and scalable STKR estimators able to leverage unlabeled data. Intuitively, via spectral transformation, STKR exploits the data distribution for which unlabeled data can provide additional information. First, we show that STKR is a principled and general approach, by characterizing a universal type of "target smoothness", and proving that any sufficiently smooth function can be learned by STKR. Second, we provide scalable STKR implementations for the inductive setting and a general transformation function, while prior work is mostly limited to the transductive setting. Third, we derive statistical guarantees for two scenarios: STKR with a known polynomial transformation, and STKR with kernel PCA when the transformation is unknown. Overall, we believe that this work helps deepen our understanding of how to work with unlabeled data, and its generality makes it easier to inspire new methods.

Paper Structure (33 sections, 16 theorems, 88 equations, 4 figures, 5 tables, 3 algorithms)

This paper contains 33 sections, 16 theorems, 88 equations, 4 figures, 5 tables, 3 algorithms.

Introduction
Deriving STKR from Diffusion Induced Multiscale Smoothness
Diffusion Induced Multiscale Smoothness
Target Smoothness can Always be Obtained from STK: Sufficient Condition
Transform-aware: STKR with Known Polynomial Transform
Transform-agnostic: Inverse Laplacian and Kernel PCA
Experiments
Conclusion
Limitations and open problems.
Related Work
Learning with Unlabeled Data
Statistical Learning Theory on Kernel Methods
Proofs
Proof of Proposition \ref{['prop:extend-lip']}
Proof of Theorem \ref{['claim:main']}
...and 18 more sections

Key Result

Proposition 1

This $\overline{\textnormal{Lip}}_{d_{K^p}} (f)$ satisfies: $\overline{\textnormal{Lip}}_{d_{K^p}} (f) = \| f \|_{{\mathcal{H}}_{K^p}}$, $\forall f \in {\mathcal{H}}_{K^p}$.

Figures (4)

Figure 1: Sample graph.
Figure 2: Test accuracy (%) of STKR-Prop (SP) with polynomial with $s(\lambda) = \lambda^p$ for $p \in \{1,2,4,6,8\}$. The test accuracy increases significantly as $p$ is larger than $1$, illustrating the benefits of the transitivity of similarity.
Figure 3: Test accuracy of SP-Lap with different values of $\eta$. The test accuracy is fairly consistent as long as $\eta$ is not too close to 0, and gets slightly better with a larger $\eta$. All reported performances are averaged across ten random seeds.
Figure : STKR-Prop for simple $s^{\space}$

Theorems & Definitions (29)

Proposition 1: Proofs in \ref{['app:proofs']}
Theorem 1
Theorem 2
Remark
Theorem 3
Remark
Lemma 2
Example 1: Inverse Laplacian for the inductive setting
Proposition 3
Proposition 4
...and 19 more

Spectrally Transformed Kernel Regression

TL;DR

Abstract

Spectrally Transformed Kernel Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (29)