Table of Contents
Fetching ...

Diffusion Representation for Asymmetric Kernels

Alvaro Almeida Gomez, Antonio Silva Neto, Jorge zubelli

TL;DR

This work generalizes diffusion-map theory to data governed by asymmetric kernels by introducing a diffusion-representation framework that employs a tensor-product Fourier basis and a 2-D FFT to achieve scalable dimensionality reduction. It derives finite-term, interpretable representations of diffusion distances for time $t=1$ and extends them to arbitrary times, including changing-data scenarios, with a weak-representation extension for large-measure sets. The approach yields substantial computational savings over traditional eigenvalue methods (e.g., $O(n^2\log n)$ versus $O(n^3)$ for SVD) while preserving key geometric insights, as demonstrated on synthetic manifolds like the sphere and Möbius strip and on climate-temperature data from Brazil. The results show that asymmetric kernels can be effectively analyzed, enabling diffusion-based analysis of directed graphs and spatiotemporal data, with practical impact on detecting regions of change and revealing intrinsic data geometry.

Abstract

We extend the diffusion-map formalism to data sets that are induced by asymmetric kernels. Analytical convergence results of the resulting expansion are proved, and an algorithm is proposed to perform the dimensional reduction. In this work we study data sets in which its geometry structure is induced by an asymmetric kernel. We use a priori coordinate system to represent this geometry and, thus, be able to improve the computational complexity of reducing the dimensionality of data sets. A coordinate system connected to the tensor product of Fourier basis is used to represent the underlying geometric structure obtained by the diffusion-map, thus reducing the dimensionality of the data set and making use of the speedup provided by the two-dimensional Fast Fourier Transform algorithm (2-D FFT). We compare our results with those obtained by other eigenvalue expansions, and verify the efficiency of the algorithms with synthetic data, as well as with real data from applications including climate change studies.

Diffusion Representation for Asymmetric Kernels

TL;DR

This work generalizes diffusion-map theory to data governed by asymmetric kernels by introducing a diffusion-representation framework that employs a tensor-product Fourier basis and a 2-D FFT to achieve scalable dimensionality reduction. It derives finite-term, interpretable representations of diffusion distances for time and extends them to arbitrary times, including changing-data scenarios, with a weak-representation extension for large-measure sets. The approach yields substantial computational savings over traditional eigenvalue methods (e.g., versus for SVD) while preserving key geometric insights, as demonstrated on synthetic manifolds like the sphere and Möbius strip and on climate-temperature data from Brazil. The results show that asymmetric kernels can be effectively analyzed, enabling diffusion-based analysis of directed graphs and spatiotemporal data, with practical impact on detecting regions of change and revealing intrinsic data geometry.

Abstract

We extend the diffusion-map formalism to data sets that are induced by asymmetric kernels. Analytical convergence results of the resulting expansion are proved, and an algorithm is proposed to perform the dimensional reduction. In this work we study data sets in which its geometry structure is induced by an asymmetric kernel. We use a priori coordinate system to represent this geometry and, thus, be able to improve the computational complexity of reducing the dimensionality of data sets. A coordinate system connected to the tensor product of Fourier basis is used to represent the underlying geometric structure obtained by the diffusion-map, thus reducing the dimensionality of the data set and making use of the speedup provided by the two-dimensional Fast Fourier Transform algorithm (2-D FFT). We compare our results with those obtained by other eigenvalue expansions, and verify the efficiency of the algorithms with synthetic data, as well as with real data from applications including climate change studies.
Paper Structure (16 sections, 9 theorems, 76 equations, 16 figures, 1 table, 1 algorithm)

This paper contains 16 sections, 9 theorems, 76 equations, 16 figures, 1 table, 1 algorithm.

Key Result

Theorem 3.1

Assume that the kernel $k$ satisfies the Hypothesis hipo, and that the representation of $k$ in the coordinate system (tensorbasis) is given by Eq. (fourep). If the coefficients satisfy the summability condition then the diffusion distance at time $t=1$ has the representation form

Figures (16)

  • Figure 4.1: Data set $X$ with $512$ random points in the sphere $S^2$.
  • Figure 4.2: Plot of the two dimensional embedding for the data set $X$ using the eigenvector basis coefficients (a), and the Fourier basis coefficients (b). Note the scale.
  • Figure 4.3: Plot of the $L^2$ error and computational time of the first two coordinates for different $n\times n$ kernel-sizes, for the data set of random points in the sphere.
  • Figure 4.4: Data set $X$ with $300$ random points in the Möbius strip $M^2$.
  • Figure 4.5: Dimensionality reduction using the Eigenvector basis, and Fourier basis
  • ...and 11 more figures

Theorems & Definitions (18)

  • Theorem 3.1: Diffusion representation for $t=1$
  • proof
  • Lemma 3.1: Approximation lemma
  • proof
  • Theorem 3.2
  • proof
  • Lemma 3.2
  • proof
  • Lemma 3.3: Continuity
  • proof
  • ...and 8 more