Low Rank Multi-Dictionary Selection at Scale

Boya Ma; Maxwell McNeil; Abram Magner; Petko Bogdanov

Low Rank Multi-Dictionary Selection at Scale

Boya Ma, Maxwell McNeil, Abram Magner, Petko Bogdanov

TL;DR

This work tackles the scalability challenge of sparse coding for 2D data using multiple dictionaries by introducing LRMDS, a method that jointly sub-selects left and right dictionary atoms and encodes with a low-rank model on the selected sub-dictionaries. It combines a greedy, alignment-based dictionary sub-selection with convex, low-rank encoding, providing theoretical guarantees that the true atoms are recoverable under mild conditions. Empirically, LRMDS achieves 3×–10× speedups and substantial improvements in representation quality over state-of-the-art baselines on synthetic and real-world datasets across diverse dictionary configurations. The approach significantly narrows the gap between scalability and accuracy in multi-dictionary sparse coding and offers a path toward extending to higher-order (tensor) data in the future.

Abstract

The sparse dictionary coding framework represents signals as a linear combination of a few predefined dictionary atoms. It has been employed for images, time series, graph signals and recently for 2-way (or 2D) spatio-temporal data employing jointly temporal and spatial dictionaries. Large and over-complete dictionaries enable high-quality models, but also pose scalability challenges which are exacerbated in multi-dictionary settings. Hence, an important problem that we address in this paper is: How to scale multi-dictionary coding for large dictionaries and datasets? We propose a multi-dictionary atom selection technique for low-rank sparse coding named LRMDS. To enable scalability to large dictionaries and datasets, it progressively selects groups of row-column atom pairs based on their alignment with the data and performs convex relaxation coding via the corresponding sub-dictionaries. We demonstrate both theoretically and experimentally that when the data has a low-rank encoding with a sparse subset of the atoms, LRMDS is able to select them with strong guarantees under mild assumptions. Furthermore, we demonstrate the scalability and quality of LRMDS in both synthetic and real-world datasets and for a range of coding dictionaries. It achieves 3X to 10X speed-up compared to baselines, while obtaining up to two orders of magnitude improvement in representation quality on some of the real world datasets given a fixed target number of atoms.

Low Rank Multi-Dictionary Selection at Scale

TL;DR

Abstract

Paper Structure (29 sections, 4 theorems, 24 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 29 sections, 4 theorems, 24 equations, 8 figures, 2 tables, 1 algorithm.

Introduction
Related work
Preliminaries
Problem formulation and solutions
Problem formulation
$\text{LRMDS}$: iterative atom selection and coding
Dictionary subselection theoretical analysis
Experimental evaluation
Datasets
Experimental setup.
Evaluation on synthetic data.
Evaluation on real-word datasets.
Theoretical guarantees validation (Thm \ref{['thm:convergence-top-k']})
Ablation study: Is joint selection critical?
Acknowledgement
...and 14 more sections

Key Result

theorem 1

Let $N, M, \Psi, \Phi^T, R, Q, \hat{R}, R_{reconst}$ be as outlined above. We then have that if $\hat{k} \geq s$ and $\hat{k} = \Theta(1)$, where $s$ is the sparsity parameter of the the signal matrix $R$, then $\|R - R_{reconst}\|_{F} = o(\|R\|_{F}).$

Figures (8)

Figure 1: \ref{['fig:2d-coding']} 2D low rank coding example for user-product preference data. The left $\Psi$ and right $\Phi$ dictionaries are derived from user and product association graphs and the goal is to encode the data sparsely via sparse and low-rank coefficient matrices $Y,W$. Our method $\text{LRMDS}$ sub-selects the dictionary atoms on both sides to speed up the coding process. \ref{['fig:intro']} Comparison of competing techniques on a Road traffic dataset. Variants of $\text{LRMDS}$ outperform all baselines in both representation quality (RMSE) and running time (best regime in the lower-left corner).
Figure 2: Comparison of competing techniques on synthetic data. \ref{['fig:syn_size_rmse']}, \ref{['fig:syn_size_time']}: RMSE and running time for varying dictionaries available to each method (listed under the x axis). The total number of (left and right dictionary) atoms is specified at the top of each figure. We stack increasing sets of dictionaries on the left and right, while the ground truth atoms are selected from the full set GW+RS. \ref{['fig:GWRS_rmse_vs_atom']}: RMSE as a function of the number of selected atoms when multiple dictionaries are provided. \ref{['fig:GWRS_time_vs_atom']}: Run time as a function of the number of selected atoms. GW+RS stands for GFT and Graph Haar wavelets stacked together for the graph dimension and RS stands for Ramanujan and Spline dictionaries stacked for the temporal dimension (details of the dictionary definitions are available in the supplement).
Figure 3: Comparison between competitors of representation quality as a function of the percentage of selected atoms Figs.\ref{['fig:twitch_rmse_vs_atom']}-\ref{['fig:covid_rmse_vs_atom']}, and runtime as a function of the percentage of selected atoms Figs.\ref{['fig:twitch_time_vs_atom']}-\ref{['fig:covid_time_vs_atom']}. All methods use a GFT for $\Psi$ and a Ramanujan periodic dictionary for $\Phi$. The dimensions of the utilized dictionaries are as follows: Twitch: $\Psi\in \mathcal{R}^{78389 \times 78389}$, $\Phi\in \mathcal{R}^{512 \times 2230}$; Wiki: $\Psi \in \mathcal{R}^{999 \times 999}$,$\Phi \in \mathcal{R}^{792 \times 6000}$; Road: $\Psi \in \mathcal{R}^{1923 \times 1923}$,$\Phi \in \mathcal{R}^{720 \times 3044}$; Covid: $\Psi \in \mathcal{R}^{3047 \times 3047}$,$\Phi \in \mathcal{R}^{678 \times 6000}$. Note: 2D-OMP's trace on the Twitch dataset is truncated early as it does not scale (fails to complete in $72$ hours) when selecting more than $13\%$ of the atoms.
Figure 4: \ref{['fig:convergence_rmse']}-\ref{['fig:hist_YW']}: Empirical demonstration of the theoretical guarantee on $\text{LRMDS}$'s ability to denoise a signal. \ref{['fig:convergence_rmse']}: "clean: LRMDS" operates on the clean matrix $R$ whereas "clean + noise" operates on the noisy signal $\hat{R}=R+Q$. The RMSE for both methods is measured with respect to the clean data $R$. \ref{['fig:hist_YW']} The absolute difference between the learned coefficient matrices for the clean data $(YW)_R$, noisy data $(YW)_{R+Q}$, and pure noise $Z_Q$. \ref{['fig:ab_rmse_vs_atom']}\ref{['fig:ab_time_vs_atom']}: Ablation study demonstrating the importance of joint selection of atoms from both dictionaries. We compare $\text{LRMDS}$ to variants in which atoms are selected from the left and right dictionaries independently ($\text{LRMDS}$-1D) or randomly (RAND). We measure RMSE \ref{['fig:ab_rmse_vs_atom']} and runtime \ref{['fig:ab_time_vs_atom']} as a function of the percentage of selected atoms.
Figure 5: Max coefficient of the learned coefficient matrix of the noise data while N increases.
...and 3 more figures

Theorems & Definitions (4)

theorem 1: Accuracy guarantee for top-$k$ atom selection denoising
lemma 1: Comparison of inner products with dictionary coefficients
lemma 2: Upper bound on the maximum of correlated Gaussians
lemma 3: Upper bound on the maximum inner product between a noise vector and a dictionary atom

Low Rank Multi-Dictionary Selection at Scale

TL;DR

Abstract

Low Rank Multi-Dictionary Selection at Scale

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (4)