Simultaneous off-the-grid learning of mixtures issued from a continuous dictionary
Cristina Butucea, Jean-François Delmas, Anne Dutfoy, Clément Hardy
TL;DR
The paper tackles simultaneous learning of multiple signals that are mixtures of a continuous dictionary of nonlinear features, addressing off-the-grid estimation of both sparse coefficients and nonlinear parameters. It introduces the Group-Nonlinear-Lasso, a regularized objective with a mixed $(\ell_1,L^p(\nu))$ penalty, and proves high-probability bounds on the prediction error through certificate functions grounded in a Riemannian geometry of the dictionary. The results encompass a general setting with an arbitrary finite measure $\nu$, and refinements for Gaussian noise and finite observation sets, including rates matching Group-Lasso in multi-task linear regression when $p=2$ and shared nonlinear parameters. The work demonstrates that simultaneous reconstruction can outperform separate, per-signal estimation when most nonlinear parameters are common across signals, and provides constructive certificates and tail bounds that underpin practical guarantees for off-the-grid learning in continuous dictionaries.
Abstract
In this paper we observe a set, possibly a continuum, of signals corrupted by noise. Each signal is a finite mixture of an unknown number of features belonging to a continuous dictionary. The continuous dictionary is parametrized by a real non-linear parameter. We shall assume that the signals share an underlying structure by assuming that each signal has its active features included in a finite and sparse set. We formulate regularized optimization problem to estimate simultaneously the linear coefficients in the mixtures and the non-linear parameters of the features. The optimization problem is composed of a data fidelity term and a $(\ell_1,L^p)$-penalty. We call its solution the Group-Nonlinear-Lasso and provide high probability bounds on the prediction error using certificate functions. Following recent works on the geometry of off-the-grid methods, we show that such functions can be constructed provided the parameters of the active features are pairwise separated by a constant with respect to a Riemannian metric.When the number of signals is finite and the noise is assumed Gaussian, we give refinements of our results for $p=1$ and $p=2$ using tail bounds on suprema of Gaussian and $χ^2$ random processes. When $p=2$, our prediction error reaches the rates obtained by the Group-Lasso estimator in the multi-task linear regression model. Furthermore, for $p=2$ these prediction rates are faster than for $p=1$ when all signals share most of the non-linear parameters.
