Table of Contents
Fetching ...

Simultaneous off-the-grid learning of mixtures issued from a continuous dictionary

Cristina Butucea, Jean-François Delmas, Anne Dutfoy, Clément Hardy

TL;DR

The paper tackles simultaneous learning of multiple signals that are mixtures of a continuous dictionary of nonlinear features, addressing off-the-grid estimation of both sparse coefficients and nonlinear parameters. It introduces the Group-Nonlinear-Lasso, a regularized objective with a mixed $(\ell_1,L^p(\nu))$ penalty, and proves high-probability bounds on the prediction error through certificate functions grounded in a Riemannian geometry of the dictionary. The results encompass a general setting with an arbitrary finite measure $\nu$, and refinements for Gaussian noise and finite observation sets, including rates matching Group-Lasso in multi-task linear regression when $p=2$ and shared nonlinear parameters. The work demonstrates that simultaneous reconstruction can outperform separate, per-signal estimation when most nonlinear parameters are common across signals, and provides constructive certificates and tail bounds that underpin practical guarantees for off-the-grid learning in continuous dictionaries.

Abstract

In this paper we observe a set, possibly a continuum, of signals corrupted by noise. Each signal is a finite mixture of an unknown number of features belonging to a continuous dictionary. The continuous dictionary is parametrized by a real non-linear parameter. We shall assume that the signals share an underlying structure by assuming that each signal has its active features included in a finite and sparse set. We formulate regularized optimization problem to estimate simultaneously the linear coefficients in the mixtures and the non-linear parameters of the features. The optimization problem is composed of a data fidelity term and a $(\ell_1,L^p)$-penalty. We call its solution the Group-Nonlinear-Lasso and provide high probability bounds on the prediction error using certificate functions. Following recent works on the geometry of off-the-grid methods, we show that such functions can be constructed provided the parameters of the active features are pairwise separated by a constant with respect to a Riemannian metric.When the number of signals is finite and the noise is assumed Gaussian, we give refinements of our results for $p=1$ and $p=2$ using tail bounds on suprema of Gaussian and $χ^2$ random processes. When $p=2$, our prediction error reaches the rates obtained by the Group-Lasso estimator in the multi-task linear regression model. Furthermore, for $p=2$ these prediction rates are faster than for $p=1$ when all signals share most of the non-linear parameters.

Simultaneous off-the-grid learning of mixtures issued from a continuous dictionary

TL;DR

The paper tackles simultaneous learning of multiple signals that are mixtures of a continuous dictionary of nonlinear features, addressing off-the-grid estimation of both sparse coefficients and nonlinear parameters. It introduces the Group-Nonlinear-Lasso, a regularized objective with a mixed penalty, and proves high-probability bounds on the prediction error through certificate functions grounded in a Riemannian geometry of the dictionary. The results encompass a general setting with an arbitrary finite measure , and refinements for Gaussian noise and finite observation sets, including rates matching Group-Lasso in multi-task linear regression when and shared nonlinear parameters. The work demonstrates that simultaneous reconstruction can outperform separate, per-signal estimation when most nonlinear parameters are common across signals, and provides constructive certificates and tail bounds that underpin practical guarantees for off-the-grid learning in continuous dictionaries.

Abstract

In this paper we observe a set, possibly a continuum, of signals corrupted by noise. Each signal is a finite mixture of an unknown number of features belonging to a continuous dictionary. The continuous dictionary is parametrized by a real non-linear parameter. We shall assume that the signals share an underlying structure by assuming that each signal has its active features included in a finite and sparse set. We formulate regularized optimization problem to estimate simultaneously the linear coefficients in the mixtures and the non-linear parameters of the features. The optimization problem is composed of a data fidelity term and a -penalty. We call its solution the Group-Nonlinear-Lasso and provide high probability bounds on the prediction error using certificate functions. Following recent works on the geometry of off-the-grid methods, we show that such functions can be constructed provided the parameters of the active features are pairwise separated by a constant with respect to a Riemannian metric.When the number of signals is finite and the noise is assumed Gaussian, we give refinements of our results for and using tail bounds on suprema of Gaussian and random processes. When , our prediction error reaches the rates obtained by the Group-Lasso estimator in the multi-task linear regression model. Furthermore, for these prediction rates are faster than for when all signals share most of the non-linear parameters.
Paper Structure (20 sections, 6 theorems, 85 equations, 2 figures)

This paper contains 20 sections, 6 theorems, 85 equations, 2 figures.

Key Result

Proposition 1.3

Let $p \in (1,2]$. Assume that the function $\theta \mapsto \phi_T(\theta)$ is continuous. Then, the minimization problem eq:generalized_lasso over $L^2(\mathcal{Z},\mathbb{R}^{K}) \times \Theta_{T}^K$, where $\Theta_T$ is a compact interval of ${\mathbb R}$, admits at least one solution.

Figures (2)

  • Figure 1: Signal in $H_T = {\mathbb R}^{T}$ with $T=100$, mixture of two Gaussian-shaped spikes with $\theta_1^\star = 0$ and $\theta_2^\star = 3$ and amplitudes in [-10,10] uniformly distributed, corrupted by i.i.d. centered Gaussian r. v. with $\sigma = 0.1$.
  • Figure 2: Prediction error $\hat{R}_T^2={\left\lVert Y-\hat{Y} \right\rVert}_{\ell_2}^2/(n T)$ given in \ref{['eq:RT2=']}, with $\hat{Y}$ denoting the reconstructed signals, and number of spikes obtained with the Group-Nonlinear-Lasso and the Group-Lasso approaches. These quantities are represented as functions of the penalty parameter $\kappa$.

Theorems & Definitions (14)

  • Example 1.1: Discrete case
  • Example 1.2: Continuous case
  • Proposition 1.3
  • Theorem 3.1
  • Remark 3.2: On the choice of $\kappa$
  • Remark 3.3: On the dimension $K$
  • Corollary 3.4
  • Remark 3.5: Comparison to the Group-Lasso estimator
  • Corollary 3.6
  • Remark 3.7
  • ...and 4 more