Table of Contents
Fetching ...

A Linear-complexity Tensor Butterfly Algorithm for Compressing High-dimensional Oscillatory Integral Operators

P. Michael Kielstra, Tianyi Shi, Hengrui Luo, Jianliang Qian, Yang Liu

TL;DR

The paper tackles the challenge of efficiently representing and applying large-scale, high-dimensional oscillatory integral operators by introducing a tensor extension of the butterfly algorithm. It develops a Tucker-type interpolative decomposition (Tucker-ID) and a multilevel tensor butterfly framework that leverages a tensor complementary low-rank (CLR) property to achieve linear complexity in the number of discretization points, $O(n^d)$, for $d$-dimensional problems. Through rank analyses and extensive numerical experiments on Green's functions, Radon transforms, and high-dimensional DFTs, the authors demonstrate substantial speedups and memory savings over traditional matrix butterfly, QTT, Tucker-ID, FFT, and NUFFT approaches, with particular gains in high-frequency regimes and higher dimensions (up to $d=6$). They show that tensor CLR reduces tensor ranks relative to matrix CLR (e.g., for Green's functions, $r_t$ can be much smaller than $r_m$), enabling efficient linear-time contraction and, in many cases, linear-time FFT-like operations across dimensions. The work promises significant practical impact for simulations in wave propagation, tomography, and seismic imaging, while noting current limitations such as grid-based tensorization and mid-level storage as avenues for future improvement.

Abstract

This paper presents a multilevel tensor compression algorithm called tensor butterfly algorithm for efficiently representing large-scale and high-dimensional oscillatory integral operators, including Green's functions for wave equations and integral transforms such as Radon transforms and Fourier transforms. The proposed algorithm leverages a tensor extension of the so-called complementary low-rank property of existing matrix butterfly algorithms. The algorithm partitions the discretized integral operator tensor into subtensors of multiple levels, and factorizes each subtensor at the middle level as a Tucker-type interpolative decomposition, whose factor matrices are formed in a multilevel fashion. For a $d$-dimensional integral operator discretized into a $2d$-mode tensor with $n^{2d}$ entries, the overall CPU time and memory requirement scale as $O(n^d)$, in stark contrast to the $O(n^d\log n)$ requirement of existing matrix algorithms such as matrix butterfly algorithm and fast Fourier transforms (FFT), where $n$ is the number of points per direction. When comparing with other tensor algorithms such as quantized tensor train (QTT), the proposed algorithm also shows superior CPU and memory performance for tensor contraction. Remarkably, the tensor butterfly algorithm can efficiently model high-frequency Green's function interactions between two unit cubes, each spanning 512 wavelengths per direction, which represents over $512\times$ larger problem sizes than existing butterfly algorithms. On the other hand, for a problem representing 64 wavelengths per direction, which is the largest size existing algorithms can handle, our tensor butterfly algorithm exhibits 200x speedups and $30\times$ memory reduction comparing with existing ones. Moreover, the tensor butterfly algorithm also permits $O(n^d)$-complexity FFTs and Radon transforms up to $d=6$ dimensions.

A Linear-complexity Tensor Butterfly Algorithm for Compressing High-dimensional Oscillatory Integral Operators

TL;DR

The paper tackles the challenge of efficiently representing and applying large-scale, high-dimensional oscillatory integral operators by introducing a tensor extension of the butterfly algorithm. It develops a Tucker-type interpolative decomposition (Tucker-ID) and a multilevel tensor butterfly framework that leverages a tensor complementary low-rank (CLR) property to achieve linear complexity in the number of discretization points, , for -dimensional problems. Through rank analyses and extensive numerical experiments on Green's functions, Radon transforms, and high-dimensional DFTs, the authors demonstrate substantial speedups and memory savings over traditional matrix butterfly, QTT, Tucker-ID, FFT, and NUFFT approaches, with particular gains in high-frequency regimes and higher dimensions (up to ). They show that tensor CLR reduces tensor ranks relative to matrix CLR (e.g., for Green's functions, can be much smaller than ), enabling efficient linear-time contraction and, in many cases, linear-time FFT-like operations across dimensions. The work promises significant practical impact for simulations in wave propagation, tomography, and seismic imaging, while noting current limitations such as grid-based tensorization and mid-level storage as avenues for future improvement.

Abstract

This paper presents a multilevel tensor compression algorithm called tensor butterfly algorithm for efficiently representing large-scale and high-dimensional oscillatory integral operators, including Green's functions for wave equations and integral transforms such as Radon transforms and Fourier transforms. The proposed algorithm leverages a tensor extension of the so-called complementary low-rank property of existing matrix butterfly algorithms. The algorithm partitions the discretized integral operator tensor into subtensors of multiple levels, and factorizes each subtensor at the middle level as a Tucker-type interpolative decomposition, whose factor matrices are formed in a multilevel fashion. For a -dimensional integral operator discretized into a -mode tensor with entries, the overall CPU time and memory requirement scale as , in stark contrast to the requirement of existing matrix algorithms such as matrix butterfly algorithm and fast Fourier transforms (FFT), where is the number of points per direction. When comparing with other tensor algorithms such as quantized tensor train (QTT), the proposed algorithm also shows superior CPU and memory performance for tensor contraction. Remarkably, the tensor butterfly algorithm can efficiently model high-frequency Green's function interactions between two unit cubes, each spanning 512 wavelengths per direction, which represents over larger problem sizes than existing butterfly algorithms. On the other hand, for a problem representing 64 wavelengths per direction, which is the largest size existing algorithms can handle, our tensor butterfly algorithm exhibits 200x speedups and memory reduction comparing with existing ones. Moreover, the tensor butterfly algorithm also permits -complexity FFTs and Radon transforms up to dimensions.

Paper Structure

This paper contains 20 sections, 43 equations, 6 figures, 4 tables, 2 algorithms.

Figures (6)

  • Figure 1: Tensor diagrams for (a) the Tucker-ID decomposition of a 4-mode tensor, and (b) the matrix partitioner corresponding to a $2^d\times 2$ partitioning with $d=2$ used in the tensor butterfly decomposition of a 2$d$-mode tensor, such as $\mathbf{W}^{\bm{s},k}_{\bm{\tau}^{\bm{c}},\nu}_{\bm{c}}$ in \ref{['eqn:tensor_butterfly_factors_W']} for fixed $\bm{s}, \bm{\tau}, k$ and $\nu$, or $\mathbf{P}^{\bm{t},k}_{\tau,\bm{\nu}^{\bm{c}}}_{\bm{c}}$ in \ref{['eqn:tensor_butterfly_factors_P']} for fixed $\bm{t}, \bm{\nu}, k$ and $\tau$. Here, each of the row and column dimensions is connected to a partitioning node. Each partitioning node has a parent edge with an arrow pointing to the dimension to be partitioned, and several children edges connected to the parent edge. The weight of the parent edge (i.e., the number of columns or rows of the matrix) equals the sum of the weights of the children edges. (c) The tensor diagram involving blocks $\mathbf{V}^{\bm{s},k}_{\bm{t}^0,\nu}$ (in green) and blocks $\mathbf{W}^{\bm{s},k}_{\bm{\tau}^{\bm{c}},\nu}_{\bm{c}}$ (in blue) for fixed $\bm{s}$ and $k$ for the tensor butterfly decomposition of a 2$d$-mode tensor.
  • Figure 1: Helmholtz equation: Computational complexity comparison among butterfly matrix, butterfly tensor, Tucker-ID and QTT for compressing (left) a 4-way Green's function tensor for interactions between two parallel 2D plates and (right) a 6-way Green's function tensor for interactions between two 3D cubes. The geometries are discretized with 4 points per wavelength. (Top): Factor time. (Middle): Factor and apply memory. (Bottom): Apply time. The largest data points correspond to $8192$ wavelengths per direction for the 2D tests (left) and $512$ wavelengths per direction for the 3D tests (right).
  • Figure 2: (a) Tensor diagram for the tensor butterfly decomposition of $L=2$ levels of a 4-mode OIO tensor representing (b) high-frequency Green's function interactions between parallel facing 2D unit squares. Only the full connectivity regarding three middle-level node pairs is shown (the two green circles and one orange circle in (a)). The orange circle in (a) represents the core tensor $\bm{\mathcal{K}}(\overline{\bm{t}},\overline{\bm{s}})$ for a mid-level pair $(\bm{t},\bm{s})$ with $\bm{t}=(t_1,t_2)$, $\bm{s}=(s_1,s_2)$ highlighted in orange in (b).
  • Figure 2: Radon transforms: Computational complexity comparison among butterfly matrix, butterfly tensor and QTT for compressing (left) a 2D Radon transform tensor and (right) a 3D Radon transform tensor. (Top): Factor time. (Middle): Factor and apply memory. (Bottom): Apply time.
  • Figure 3: Illustration of the tensor CLR property with $L=2$ for a 4-mode tensor representing free-space Green's function interactions between parallel facing unit square plates. (a) The target and source domains are partitioned at $l=0$ (top) and $l=L^c=1$ (bottom) with a multi-set pair $(\bm{\tau},\bm{s}_{k\leftarrow \nu})$ highlighted in orange for the skeletonization along mode 4. The sizes of the nodes are $|\tau_1|=m_1$,$|\tau_2|=m_1$, $|s_1|=n_1$ and $|\nu|=n_2$. (b) Illustration of the rank of the matricization of $\bm{\mathcal{K}}(\bm{\tau},\bm{s}_{k\leftarrow \nu})$used in the matrix butterfly algorithm. Here $a$ is the radius of the sphere enclosing the target domain of physical sizes ${m_1}\delta_x\times m_2\delta_y$. $\theta\approx\frac{n_1}{n\rho_{\min}}$, $\phi\approx\frac{n_2}{n\rho_{\min}}$, and the product $\theta\phi$ represents the solid angle covered by the source domain as seen from the center of the target domain. (c) Illustration of the rank of the mode-4 unfolding of $\bm{\mathcal{K}}(\bm{\tau},\bm{s}_{k\leftarrow \nu})$used in the tensor butterfly algorithm. Here, $a'$ is the radius of the sphere enclosing the enlarged target domain. The source domain is reduced to a line segment of length ${n_2}\delta_y$.
  • ...and 1 more figures