End-to-end Kernel Learning via Generative Random Fourier Features
Kun Fang, Fanghui Liu, Xiaolin Huang, Jie Yang
TL;DR
This paper tackles the challenge of kernel learning by moving beyond two-stage random Fourier feature methods to an end-to-end Generative RFFs (GRFFs) framework that jointly learns a kernel distribution and a linear classifier through a generative network. By sampling kernel weights from a learned distribution and composing multi-layer GRFFs, the approach yields deeper feature representations and improved generalization, demonstrated via an ERM objective $\min_{\theta_G,\theta_C} \frac{1}{n} \sum_i L(C(\phi(x_i, G(\mathbf{N}))), y_i)$ and extended to progressively trained, multi-layer architectures. The paper also introduces an image-data variant with CNN-like generators that perform randomized convolution kernels and shows enhanced adversarial robustness through weight resampling against Iter.L.L. attacks. Overall, GRFFs offer a flexible, scalable alternative to fixed-kernel methods, achieving competitive performance with deep models while providing robustness benefits and a principled path to learn kernels directly from data.
Abstract
Random Fourier features (RFFs) provide a promising way for kernel learning in a spectral case. Current RFFs-based kernel learning methods usually work in a two-stage way. In the first-stage process, learning the optimal feature map is often formulated as a target alignment problem, which aims to align the learned kernel with the pre-defined target kernel (usually the ideal kernel). In the second-stage process, a linear learner is conducted with respect to the mapped random features. Nevertheless, the pre-defined kernel in target alignment is not necessarily optimal for the generalization of the linear learner. Instead, in this paper, we consider a one-stage process that incorporates the kernel learning and linear learner into a unifying framework. To be specific, a generative network via RFFs is devised to implicitly learn the kernel, followed by a linear classifier parameterized as a full-connected layer. Then the generative network and the classifier are jointly trained by solving the empirical risk minimization (ERM) problem to reach a one-stage solution. This end-to-end scheme naturally allows deeper features, in correspondence to a multi-layer structure, and shows superior generalization performance over the classical two-stage, RFFs-based methods in real-world classification tasks. Moreover, inspired by the randomized resampling mechanism of the proposed method, its enhanced adversarial robustness is investigated and experimentally verified.
