Table of Contents
Fetching ...

End-to-end Kernel Learning via Generative Random Fourier Features

Kun Fang, Fanghui Liu, Xiaolin Huang, Jie Yang

TL;DR

This paper tackles the challenge of kernel learning by moving beyond two-stage random Fourier feature methods to an end-to-end Generative RFFs (GRFFs) framework that jointly learns a kernel distribution and a linear classifier through a generative network. By sampling kernel weights from a learned distribution and composing multi-layer GRFFs, the approach yields deeper feature representations and improved generalization, demonstrated via an ERM objective $\min_{\theta_G,\theta_C} \frac{1}{n} \sum_i L(C(\phi(x_i, G(\mathbf{N}))), y_i)$ and extended to progressively trained, multi-layer architectures. The paper also introduces an image-data variant with CNN-like generators that perform randomized convolution kernels and shows enhanced adversarial robustness through weight resampling against Iter.L.L. attacks. Overall, GRFFs offer a flexible, scalable alternative to fixed-kernel methods, achieving competitive performance with deep models while providing robustness benefits and a principled path to learn kernels directly from data.

Abstract

Random Fourier features (RFFs) provide a promising way for kernel learning in a spectral case. Current RFFs-based kernel learning methods usually work in a two-stage way. In the first-stage process, learning the optimal feature map is often formulated as a target alignment problem, which aims to align the learned kernel with the pre-defined target kernel (usually the ideal kernel). In the second-stage process, a linear learner is conducted with respect to the mapped random features. Nevertheless, the pre-defined kernel in target alignment is not necessarily optimal for the generalization of the linear learner. Instead, in this paper, we consider a one-stage process that incorporates the kernel learning and linear learner into a unifying framework. To be specific, a generative network via RFFs is devised to implicitly learn the kernel, followed by a linear classifier parameterized as a full-connected layer. Then the generative network and the classifier are jointly trained by solving the empirical risk minimization (ERM) problem to reach a one-stage solution. This end-to-end scheme naturally allows deeper features, in correspondence to a multi-layer structure, and shows superior generalization performance over the classical two-stage, RFFs-based methods in real-world classification tasks. Moreover, inspired by the randomized resampling mechanism of the proposed method, its enhanced adversarial robustness is investigated and experimentally verified.

End-to-end Kernel Learning via Generative Random Fourier Features

TL;DR

This paper tackles the challenge of kernel learning by moving beyond two-stage random Fourier feature methods to an end-to-end Generative RFFs (GRFFs) framework that jointly learns a kernel distribution and a linear classifier through a generative network. By sampling kernel weights from a learned distribution and composing multi-layer GRFFs, the approach yields deeper feature representations and improved generalization, demonstrated via an ERM objective and extended to progressively trained, multi-layer architectures. The paper also introduces an image-data variant with CNN-like generators that perform randomized convolution kernels and shows enhanced adversarial robustness through weight resampling against Iter.L.L. attacks. Overall, GRFFs offer a flexible, scalable alternative to fixed-kernel methods, achieving competitive performance with deep models while providing robustness benefits and a principled path to learn kernels directly from data.

Abstract

Random Fourier features (RFFs) provide a promising way for kernel learning in a spectral case. Current RFFs-based kernel learning methods usually work in a two-stage way. In the first-stage process, learning the optimal feature map is often formulated as a target alignment problem, which aims to align the learned kernel with the pre-defined target kernel (usually the ideal kernel). In the second-stage process, a linear learner is conducted with respect to the mapped random features. Nevertheless, the pre-defined kernel in target alignment is not necessarily optimal for the generalization of the linear learner. Instead, in this paper, we consider a one-stage process that incorporates the kernel learning and linear learner into a unifying framework. To be specific, a generative network via RFFs is devised to implicitly learn the kernel, followed by a linear classifier parameterized as a full-connected layer. Then the generative network and the classifier are jointly trained by solving the empirical risk minimization (ERM) problem to reach a one-stage solution. This end-to-end scheme naturally allows deeper features, in correspondence to a multi-layer structure, and shows superior generalization performance over the classical two-stage, RFFs-based methods in real-world classification tasks. Moreover, inspired by the randomized resampling mechanism of the proposed method, its enhanced adversarial robustness is investigated and experimentally verified.

Paper Structure

This paper contains 17 sections, 13 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: An illustration of the one-stage generative random Fourier features.
  • Figure 2: An illustration of the multi-layer structure of GRFFs.
  • Figure 3: An example of the training process on a bi-classification task. A two-layer structure is adopted. Loss and accuracy variations are recorded. There are evident performance leaps, i.e., the drop of losses and the step-up of accuracies, by adding a second layer at the turning point of 200-th epoch. Detailed settings can be found in section \ref{['sec-experimentsresults']}.
  • Figure 4: Performance on the synthetic data. (a) Data distribution when the dimension equals to 2. (b) Misclassification errors on the test set of different methods. (c) Misclassification errors on the training and test sets of SL-GRFF and ML-GRFF.
  • Figure 5: PCA visualizations of features of SL-GRFF and ML-GRFF, corresponding to synthetic data when $d=8$ (top) and $d=24$ (bottom) respectively. The blue diamond points denote positive samples, while the red pentagram points denote negative samples.
  • ...and 4 more figures