CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT)
Maksim Melnichenko, Oleg Balabanov, Riley Murray, James Demmel, Michael W. Mahoney, Piotr Luszczek
TL;DR
CQRRPT addresses the expensive pivoting cost in QR with column pivoting for tall matrices by injecting a single randomized sketch, followed by a deterministic CholeskyQR-based preconditioning. The method yields a pivoted QR decomposition that preserves rank-revealing and stability properties under standard RandNLA assumptions, with a leading arithmetic cost of $3m n^{2}$ (plus sketch terms) and favorable communication characteristics. The authors provide rigorous RRQR and stability results, discuss practical rank estimation, and demonstrate substantial speedups over LAPACK's GEQP3 on large tall matrices, while maintaining explicit $\mathbf{Q}$ factors. The work positions CQRRPT as a robust, scalable tool for orthogonalization in high-performance computing, with open-source RandLAPACK implementations and directions for future sketching improvements and GPU/low-precision extensions.
Abstract
This paper develops and analyzes a new algorithm for QR decomposition with column pivoting (QRCP) of rectangular matrices with many more rows than columns. The algorithm carefully combines methods from randomized numerical linear algebra to accelerate pivot decisions for the input matrix and the process of decomposing the pivoted matrix into the QR form. The source of the latter improvement is CholeskyQR with randomized preconditioning. Comprehensive analysis is provided in both exact and finite-precision arithmetic to characterize the algorithm's rank-revealing properties and its numerical stability granted probabilistic assumptions of the sketching operator. An implementation of the proposed algorithm is described and made available inside the open-source RandLAPACK library, which itself relies on RandBLAS. Experiments with this implementation on an Intel Xeon Gold 6248R CPU demonstrate order-of-magnitude speedups over LAPACK's standard function for QRCP, and comparable performance to a specialized algorithm for unpivoted QR of tall matrices, which lacks the strong rank-revealing properties of the proposed method.
