Diffusion Representation for Asymmetric Kernels
Alvaro Almeida Gomez, Antonio Silva Neto, Jorge zubelli
TL;DR
This work generalizes diffusion-map theory to data governed by asymmetric kernels by introducing a diffusion-representation framework that employs a tensor-product Fourier basis and a 2-D FFT to achieve scalable dimensionality reduction. It derives finite-term, interpretable representations of diffusion distances for time $t=1$ and extends them to arbitrary times, including changing-data scenarios, with a weak-representation extension for large-measure sets. The approach yields substantial computational savings over traditional eigenvalue methods (e.g., $O(n^2\log n)$ versus $O(n^3)$ for SVD) while preserving key geometric insights, as demonstrated on synthetic manifolds like the sphere and Möbius strip and on climate-temperature data from Brazil. The results show that asymmetric kernels can be effectively analyzed, enabling diffusion-based analysis of directed graphs and spatiotemporal data, with practical impact on detecting regions of change and revealing intrinsic data geometry.
Abstract
We extend the diffusion-map formalism to data sets that are induced by asymmetric kernels. Analytical convergence results of the resulting expansion are proved, and an algorithm is proposed to perform the dimensional reduction. In this work we study data sets in which its geometry structure is induced by an asymmetric kernel. We use a priori coordinate system to represent this geometry and, thus, be able to improve the computational complexity of reducing the dimensionality of data sets. A coordinate system connected to the tensor product of Fourier basis is used to represent the underlying geometric structure obtained by the diffusion-map, thus reducing the dimensionality of the data set and making use of the speedup provided by the two-dimensional Fast Fourier Transform algorithm (2-D FFT). We compare our results with those obtained by other eigenvalue expansions, and verify the efficiency of the algorithms with synthetic data, as well as with real data from applications including climate change studies.
