Deep greedy unfolding: Sorting out argsorting in greedy sparse recovery algorithms
Sina Mohammad-Taheri, Matthew J. Colbrook, Simone Brugiapaglia
TL;DR
This work tackles the non-differentiability of the argsort operator in greedy sparse recovery algorithms by introducing differentiable Soft-OMP and Soft-IHT via softsort, enabling gradient-based training through algorithm unrolling. The authors establish theoretical guarantees that Soft-OMP/Soft-IHT approximate their non-differentiable counterparts with error controlled by a temperature parameter $\tau$, and they develop OMP-Net and IHT-Net to learn structure-aware sparsity patterns. Empirically, Soft-OMP/Soft-IHT approximate the original methods for sufficiently small $\tau$, and the trained greedy networks outperform classical OMP/IHT in heavily undersampled regimes, demonstrating practical impact for structured sparse recovery. The framework connects model-based recovery with data-driven learning, offering a path to refined latent-structure extraction and extensible extensions to other greedy algorithms and architectures.
Abstract
Gradient-based learning imposes (deep) neural networks to be differentiable at all steps. This includes model-based architectures constructed by unrolling iterations of an iterative algorithm onto layers of a neural network, known as algorithm unrolling. However, greedy sparse recovery algorithms depend on the non-differentiable argsort operator, which hinders their integration into neural networks. In this paper, we address this challenge in Orthogonal Matching Pursuit (OMP) and Iterative Hard Thresholding (IHT), two popular representative algorithms in this class. We propose permutation-based variants of these algorithms and approximate permutation matrices using "soft" permutation matrices derived from softsort, a continuous relaxation of argsort. We demonstrate -- both theoretically and numerically -- that Soft-OMP and Soft-IHT, as differentiable counterparts of OMP and IHT and fully compatible with neural network training, effectively approximate these algorithms with a controllable degree of accuracy. This leads to the development of OMP- and IHT-Net, fully trainable network architectures based on Soft-OMP and Soft-IHT, respectively. Finally, by choosing weights as "structure-aware" trainable parameters, we connect our approach to structured sparse recovery and demonstrate its ability to extract latent sparsity patterns from data.
