Table of Contents
Fetching ...

svds-C: A Multi-Thread C Code for Computing Truncated Singular Value Decomposition

Xu Feng, Wenjian Yu, Yuyang Xie

TL;DR

This work addresses the need for fast, robust truncated SVD on large-scale matrices by re-implementing Matlab's svds in C as svds-C and leveraging multi-threading through MKL/OpenBLAS. By reworking Lanczos bidiagonalization with augmented restarting and careful memory management, svds-C achieves substantial speedups (up to $12\times$ on 16 cores) and memory reductions across Intel and AMD CPUs, while preserving accuracy ($A\approx U_k\Sigma_k V_k^\mathrm{T}$). The study demonstrates svds-C's competitiveness and robustness against other state-of-the-art truncated-SVD algorithms across diverse synthetic and real-world datasets, and releases the open-source code for broad use. The practical impact is significant for high-performance data analysis tasks (e.g., PCA, low-rank approximations) requiring reliable, scalable truncated SVD on modern hardware.

Abstract

This article presents svds-C, an open-source and high-performance C program for accurately and robustly computing truncated SVD, e.g. computing several largest singular values and corresponding singular vectors. We have re-implemented the algorithm of svds in Matlab in C based on MKL or OpenBLAS and multi-thread computing to obtain the parallel program named svds-C. svds-C running on shared-memory computer consumes less time and memory than svds thanks to careful implementation of multi-thread parallelization and memory management. Numerical experiments on different test cases which are synthetically generated or directly from real world datasets show that, svds-C runs remarkably faster than svds with averagely 4.7X and at most 12X speedup for 16-thread parallel computing on a computer with Intel CPU, while preserving same accuracy and consuming about half memory space. Experimental results also demonstrate that svds-C has similar advantages over svds on the computer with AMD CPU, and outperforms other state-of-the-art algorithms for truncated SVD on computing time and robustness.

svds-C: A Multi-Thread C Code for Computing Truncated Singular Value Decomposition

TL;DR

This work addresses the need for fast, robust truncated SVD on large-scale matrices by re-implementing Matlab's svds in C as svds-C and leveraging multi-threading through MKL/OpenBLAS. By reworking Lanczos bidiagonalization with augmented restarting and careful memory management, svds-C achieves substantial speedups (up to on 16 cores) and memory reductions across Intel and AMD CPUs, while preserving accuracy (). The study demonstrates svds-C's competitiveness and robustness against other state-of-the-art truncated-SVD algorithms across diverse synthetic and real-world datasets, and releases the open-source code for broad use. The practical impact is significant for high-performance data analysis tasks (e.g., PCA, low-rank approximations) requiring reliable, scalable truncated SVD on modern hardware.

Abstract

This article presents svds-C, an open-source and high-performance C program for accurately and robustly computing truncated SVD, e.g. computing several largest singular values and corresponding singular vectors. We have re-implemented the algorithm of svds in Matlab in C based on MKL or OpenBLAS and multi-thread computing to obtain the parallel program named svds-C. svds-C running on shared-memory computer consumes less time and memory than svds thanks to careful implementation of multi-thread parallelization and memory management. Numerical experiments on different test cases which are synthetically generated or directly from real world datasets show that, svds-C runs remarkably faster than svds with averagely 4.7X and at most 12X speedup for 16-thread parallel computing on a computer with Intel CPU, while preserving same accuracy and consuming about half memory space. Experimental results also demonstrate that svds-C has similar advantages over svds on the computer with AMD CPU, and outperforms other state-of-the-art algorithms for truncated SVD on computing time and robustness.
Paper Structure (14 sections, 1 theorem, 9 equations, 1 figure, 5 tables, 2 algorithms)

This paper contains 14 sections, 1 theorem, 9 equations, 1 figure, 5 tables, 2 algorithms.

Key Result

Proposition 1

Suppose {$\mathbf{\hat{u}}_j,\mathbf{\hat{v}}_j,\hat{\sigma}_j$} denotes the $j$-th largest singular triplet of the bidiagonal matrix $\mathbf{T}\in \mathbb{R}^{t\times t}$ obtained with Alg. 1, and {$\mathbf{\tilde{u}}_j,\mathbf{\tilde{v}}_j,\tilde{\sigma}_j$}, ($1\le j \le t$), is the $j$-th large where $\mathbf{U}$ and $\mathbf{V}$ are the orthonormal matrices outputted by Alg. 1. Then, the com

Figures (1)

  • Figure 1: The computed singular values of test cases from svds, svds-C, lansvd propack, PRIMME_SVDS wu2017primme_svds and svds in Armadillo sanderson2016armadillo (setting $k=100$).

Theorems & Definitions (1)

  • Proposition 1