Figaro on GPUs: Two Tables
Dorde Zivanovic
TL;DR
The paper addresses efficient QR and SVD computation on matrices formed by the Cartesian join of two tables, which inflates data size. It extends Figaro with a GPU-accelerated pipeline that combines structure-aware symbolic Givens rotations with parallel computation to reduce the QR/SVD workload, achieving a theoretical complexity reduction to $O((m_1+m_2)imes(n_1+n_2)^2)$. Empirical results on NVIDIA GPUs show substantial speedups over cuSolver (e.g., up to $160\times$ for the QR upper triangular factor and up to $31\times$ for singular values) with markedly lower memory usage. The work enables faster linear algebra on join-derived data and has practical impact for ML and analytics pipelines that rely on QR/SVD on join outputs.
Abstract
This paper introduces the implementation of the Figaro-GPU algorithm for computing a QR and SVD decomposition over a join matrix defined by the natural join over two tables on GPUs. Figaro-GPU's main novelty is a GPU implementation of the Figaro algorithm \cite{olteanu2022givens, vzivanovic2022linear,olteanu2024givens}: symbolical transformations combined with the GPU parallelized computations. This leads to the theoretical performance improvements proportional to the ratio of the join and input sizes. In experiments with the synthetic tables, for computing the upper triangular matrix and the right singular vectors matrix, Figaro-GPU outperforms in runtime NVIDIA cuSolver library for the upper triangular matrix by a factor proportional to the gap between the join and input sizes, which varies from 5x-150x for NVIDIA 2070 and up to 160x for NVIDIA 4080 while using up to 1000x less memory than the GPU cuSolver. For computing singular values, Figaro-GPU outperforms in runtime NVIDIA cuSolver library from 2.8x-31x for NVIDIA 4080.
