Neural Fourier Transform: A General Approach to Equivariant Representation Learning
Masanori Koyama, Kenji Fukumizu, Kohei Hayashi, Takeru Miyato
TL;DR
NFT provides a general framework for equivariant representation learning by learning a latent linear action of a group from data without explicit action knowledge. It connects Fourier analysis to nonlinear settings via invariant kernels and RKHS, proving existence and identifiability and presenting three NFT modes (U-NFT, G-NFT, g-NFT). Empirically, NFT recovers major symmetry modes in nonlinear deformations of signals and demonstrates strong OOD generalization and novel-view capabilities on image datasets, often outperforming standard DFT and some steerable baselines in challenging settings. By enabling data-dependent spectral decomposition and the incorporation of prior symmetry structure, NFT offers a flexible, theoretically grounded approach to symmetry-aware learning with broad applicability and several open questions for optimization guarantees and scalability.
Abstract
Symmetry learning has proven to be an effective approach for extracting the hidden structure of data, with the concept of equivariance relation playing the central role. However, most of the current studies are built on architectural theory and corresponding assumptions on the form of data. We propose Neural Fourier Transform (NFT), a general framework of learning the latent linear action of the group without assuming explicit knowledge of how the group acts on data. We present the theoretical foundations of NFT and show that the existence of a linear equivariant feature, which has been assumed ubiquitously in equivariance learning, is equivalent to the existence of a group invariant kernel on the dataspace. We also provide experimental results to demonstrate the application of NFT in typical scenarios with varying levels of knowledge about the acting group.
