Simultaneous off-the-grid learning of mixtures issued from a continuous dictionary

Cristina Butucea; Jean-François Delmas; Anne Dutfoy; Clément Hardy

Simultaneous off-the-grid learning of mixtures issued from a continuous dictionary

Cristina Butucea, Jean-François Delmas, Anne Dutfoy, Clément Hardy

TL;DR

The paper tackles simultaneous learning of multiple signals that are mixtures of a continuous dictionary of nonlinear features, addressing off-the-grid estimation of both sparse coefficients and nonlinear parameters. It introduces the Group-Nonlinear-Lasso, a regularized objective with a mixed $(\ell_1,L^p(\nu))$ penalty, and proves high-probability bounds on the prediction error through certificate functions grounded in a Riemannian geometry of the dictionary. The results encompass a general setting with an arbitrary finite measure $\nu$, and refinements for Gaussian noise and finite observation sets, including rates matching Group-Lasso in multi-task linear regression when $p=2$ and shared nonlinear parameters. The work demonstrates that simultaneous reconstruction can outperform separate, per-signal estimation when most nonlinear parameters are common across signals, and provides constructive certificates and tail bounds that underpin practical guarantees for off-the-grid learning in continuous dictionaries.

Abstract

In this paper we observe a set, possibly a continuum, of signals corrupted by noise. Each signal is a finite mixture of an unknown number of features belonging to a continuous dictionary. The continuous dictionary is parametrized by a real non-linear parameter. We shall assume that the signals share an underlying structure by assuming that each signal has its active features included in a finite and sparse set. We formulate regularized optimization problem to estimate simultaneously the linear coefficients in the mixtures and the non-linear parameters of the features. The optimization problem is composed of a data fidelity term and a $(\ell_1,L^p)$-penalty. We call its solution the Group-Nonlinear-Lasso and provide high probability bounds on the prediction error using certificate functions. Following recent works on the geometry of off-the-grid methods, we show that such functions can be constructed provided the parameters of the active features are pairwise separated by a constant with respect to a Riemannian metric.When the number of signals is finite and the noise is assumed Gaussian, we give refinements of our results for $p=1$ and $p=2$ using tail bounds on suprema of Gaussian and $χ^2$ random processes. When $p=2$, our prediction error reaches the rates obtained by the Group-Lasso estimator in the multi-task linear regression model. Furthermore, for $p=2$ these prediction rates are faster than for $p=1$ when all signals share most of the non-linear parameters.

Simultaneous off-the-grid learning of mixtures issued from a continuous dictionary

TL;DR

penalty, and proves high-probability bounds on the prediction error through certificate functions grounded in a Riemannian geometry of the dictionary. The results encompass a general setting with an arbitrary finite measure

, and refinements for Gaussian noise and finite observation sets, including rates matching Group-Lasso in multi-task linear regression when

and shared nonlinear parameters. The work demonstrates that simultaneous reconstruction can outperform separate, per-signal estimation when most nonlinear parameters are common across signals, and provides constructive certificates and tail bounds that underpin practical guarantees for off-the-grid learning in continuous dictionaries.

Abstract

-penalty. We call its solution the Group-Nonlinear-Lasso and provide high probability bounds on the prediction error using certificate functions. Following recent works on the geometry of off-the-grid methods, we show that such functions can be constructed provided the parameters of the active features are pairwise separated by a constant with respect to a Riemannian metric.When the number of signals is finite and the noise is assumed Gaussian, we give refinements of our results for

and

using tail bounds on suprema of Gaussian and

random processes. When

, our prediction error reaches the rates obtained by the Group-Lasso estimator in the multi-task linear regression model. Furthermore, for

these prediction rates are faster than for

when all signals share most of the non-linear parameters.

Paper Structure (20 sections, 6 theorems, 85 equations, 2 figures)

This paper contains 20 sections, 6 theorems, 85 equations, 2 figures.

Introduction
Model and method
Previous work
Contributions
Group-Nonlinear-Lasso vs. Group-Lasso on a grid
Organization of the paper and notation
Assumptions on the model
Regularity and non-degeneracy assumptions on the features
The kernel and its Riemannian derivatives
Kernel space and associated Riemannian metric
The kernel associated to the dictionary of features
Main results
General bound on the prediction error
Explicit bounds for Gaussian noise and finite number of signals
The case $p=2$ and $\mathcal{Z}$ finite
...and 5 more sections

Key Result

Proposition 1.3

Let $p \in (1,2]$. Assume that the function $\theta \mapsto \phi_T(\theta)$ is continuous. Then, the minimization problem eq:generalized_lasso over $L^2(\mathcal{Z},\mathbb{R}^{K}) \times \Theta_{T}^K$, where $\Theta_T$ is a compact interval of ${\mathbb R}$, admits at least one solution.

Figures (2)

Figure 1: Signal in $H_T = {\mathbb R}^{T}$ with $T=100$, mixture of two Gaussian-shaped spikes with $\theta_1^\star = 0$ and $\theta_2^\star = 3$ and amplitudes in [-10,10] uniformly distributed, corrupted by i.i.d. centered Gaussian r. v. with $\sigma = 0.1$.
Figure 2: Prediction error $\hat{R}_T^2={\left\lVert Y-\hat{Y} \right\rVert}_{\ell_2}^2/(n T)$ given in \ref{['eq:RT2=']}, with $\hat{Y}$ denoting the reconstructed signals, and number of spikes obtained with the Group-Nonlinear-Lasso and the Group-Lasso approaches. These quantities are represented as functions of the penalty parameter $\kappa$.

Theorems & Definitions (14)

Example 1.1: Discrete case
Example 1.2: Continuous case
Proposition 1.3
Theorem 3.1
Remark 3.2: On the choice of $\kappa$
Remark 3.3: On the dimension $K$
Corollary 3.4
Remark 3.5: Comparison to the Group-Lasso estimator
Corollary 3.6
Remark 3.7
...and 4 more

Simultaneous off-the-grid learning of mixtures issued from a continuous dictionary

TL;DR

Abstract

Simultaneous off-the-grid learning of mixtures issued from a continuous dictionary

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (14)