Off-the-grid learning of mixtures from a continuous dictionary

Cristina Butucea; Jean-François Delmas; Anne Dutfoy; Clément Hardy

Off-the-grid learning of mixtures from a continuous dictionary

Cristina Butucea, Jean-François Delmas, Anne Dutfoy, Clément Hardy

TL;DR

The paper tackles off-the-grid learning of sparse mixtures from a continuous dictionary under Gaussian noise, proposing a convex, non-discretized optimization framework that jointly recovers sparse amplitudes and nonlinear dictionary parameters. Central to the analysis are interpolating certificates and a Riemannian-geometry based treatment of the parameter space, which yield high-probability prediction bounds and near-optimal rates up to logarithmic factors. The results encompass a general theory with explicit certificates and conditions, and are instantiated in two concrete applications: Gaussian sparse spike deconvolution and a scaled exponential model, both with detailed existence proofs for certificates and concrete prediction guarantees. The work advances sparse dictionary learning by enabling truly off-grid estimation with principled guarantees, extending super-resolution techniques beyond translation-invariant dictionaries to more general nonlinear parameterizations. The framework is poised to impact signal processing applications where continuous dictionaries and non-convex parameter estimation arise, offering rigorous performance certificates and algorithmic guidance.

Abstract

We consider a general non-linear model where the signal is a finite mixture of an unknown, possibly increasing, number of features issued from a continuous dictionary parameterized by a real non-linear parameter. The signal is observed with Gaussian (possibly correlated) noise in either a continuous or a discrete setup. We propose an off-the-grid optimization method, that is, a method which does not use any discretization scheme on the parameter space, to estimate both the non-linear parameters of the features and the linear parameters of the mixture. We use recent results on the geometry of off-the-grid methods to give minimal separation on the true underlying non-linear parameters such that interpolating certificate functions can be constructed. Using also tail bounds for suprema of Gaussian processes we bound the prediction error with high probability. Assuming that the certificate functions can be constructed, our prediction error bound is up to $\log$-factors similar to the rates attained by the Lasso predictor in the linear regression model. We also establish convergence rates that quantify with high probability the quality of estimation for both the linear and the non-linear parameters. We develop in full details our main results for two applications: the Gaussian spike deconvolution and the scaled exponential model.

Off-the-grid learning of mixtures from a continuous dictionary

TL;DR

Abstract

-factors similar to the rates attained by the Lasso predictor in the linear regression model. We also establish convergence rates that quantify with high probability the quality of estimation for both the linear and the non-linear parameters. We develop in full details our main results for two applications: the Gaussian spike deconvolution and the scaled exponential model.

Paper Structure (47 sections, 22 theorems, 283 equations)

This paper contains 47 sections, 22 theorems, 283 equations.

Introduction
Model and method
Examples
Discrete-time models
Continuous-time models with truncated white noise or colored noise
Previous work
Contributions
Gaussian sparse spike deconvolution, see Section \ref{['sec:example']}.
Scaled exponential model, see Section \ref{['sec:scaled']}.
Main Results
Dictionary of features
Assumptions on the regularity of the features
Examples of regular features
Translation discrete-time model
Translation model with a continuum of observations
...and 32 more sections

Key Result

Theorem 2.1

Assume we observe the random element $y$ of $H_T$ under the regression model (eq:model) with unknown parameters $\beta^\star$ and $\vartheta^\star= \left ( \theta_1^\star,\cdots,\theta_K^\star\right )$ a vector with entries in $\Theta_T$, a compact interval of ${\mathbb R}$, such that: Then, there exist finite positive constants $\mathcal{C}_0$, $\mathcal{C}_1$, $\mathcal{C}_2$, $\mathcal{C}_3$

Theorems & Definitions (48)

Theorem 2.1
Remark 2.2: Comparison with the Lasso estimator
Remark 2.3: Proximity to the limit kernel
Remark 2.4: On the dimension $K$, the upper bound of the sparsity
Theorem 2.5
Remark 2.6: Again on the dimension $K$
Lemma 3.1: On the positivity of $g_T$
proof
Remark 4.1
Lemma 4.2
...and 38 more

Off-the-grid learning of mixtures from a continuous dictionary

TL;DR

Abstract

Off-the-grid learning of mixtures from a continuous dictionary

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (48)