Table of Contents
Fetching ...

Analysing Multi-Task Regression via Random Matrix Theory with Application to Time Series Forecasting

Romain Ilbert, Malik Tiomoko, Cosme Louart, Ambroise Odonnat, Vasilii Feofanov, Themis Palpanas, Ievgen Redko

TL;DR

A novel theoretical framework for multi-task regression is introduced, applying random matrix theory to provide precise performance estimations, under high-dimensional, non-Gaussian data distributions, thereby offering a robust foundation for hyperparameter optimization in multi-task regression scenarios.

Abstract

In this paper, we introduce a novel theoretical framework for multi-task regression, applying random matrix theory to provide precise performance estimations, under high-dimensional, non-Gaussian data distributions. We formulate a multi-task optimization problem as a regularization technique to enable single-task models to leverage multi-task learning information. We derive a closed-form solution for multi-task optimization in the context of linear models. Our analysis provides valuable insights by linking the multi-task learning performance to various model statistics such as raw data covariances, signal-generating hyperplanes, noise levels, as well as the size and number of datasets. We finally propose a consistent estimation of training and testing errors, thereby offering a robust foundation for hyperparameter optimization in multi-task regression scenarios. Experimental validations on both synthetic and real-world datasets in regression and multivariate time series forecasting demonstrate improvements on univariate models, incorporating our method into the training loss and thus leveraging multivariate information.

Analysing Multi-Task Regression via Random Matrix Theory with Application to Time Series Forecasting

TL;DR

A novel theoretical framework for multi-task regression is introduced, applying random matrix theory to provide precise performance estimations, under high-dimensional, non-Gaussian data distributions, thereby offering a robust foundation for hyperparameter optimization in multi-task regression scenarios.

Abstract

In this paper, we introduce a novel theoretical framework for multi-task regression, applying random matrix theory to provide precise performance estimations, under high-dimensional, non-Gaussian data distributions. We formulate a multi-task optimization problem as a regularization technique to enable single-task models to leverage multi-task learning information. We derive a closed-form solution for multi-task optimization in the context of linear models. Our analysis provides valuable insights by linking the multi-task learning performance to various model statistics such as raw data covariances, signal-generating hyperplanes, noise levels, as well as the size and number of datasets. We finally propose a consistent estimation of training and testing errors, thereby offering a robust foundation for hyperparameter optimization in multi-task regression scenarios. Experimental validations on both synthetic and real-world datasets in regression and multivariate time series forecasting demonstrate improvements on univariate models, incorporating our method into the training loss and thus leveraging multivariate information.
Paper Structure (56 sections, 7 theorems, 90 equations, 3 figures, 3 tables)

This paper contains 56 sections, 7 theorems, 90 equations, 3 figures, 3 tables.

Key Result

Theorem 1

Assuming that the training data vectors ${\mathbf{x}}_i^{(t)}$ and the test data vectors ${\mathbf{x}}^{(t)}$ are concentrated random vectors, and given the growth rate assumption (Assumption ass:growth_rate), it follows that:

Figures (3)

  • Figure 1: Test loss contributions ${\mathbf{D}}_{IL}$, ${\mathbf{C}}_{MTL}$, ${\mathbf{N}}_{NT}$ across three sample size regimes. Test risk exhibits decreasing, increasing, or convex shapes based on the regime. Optimal values of $\lambda$ from theory are marked.
  • Figure 2: Empirical and theoretical train and test MSE as functions of the parameter $\lambda$ for different values of $\alpha$. The smooth curves represent the theoretical predictions, while the corresponding curves with the same color show the empirical results, highlighting that the empirical observations indeed match the theoretical predictions.
  • Figure 3: Theoretical vs Empirical MSE as function of regularization parameter $\lambda$. Close fit between the theoretical and the empirical predictions which underscores the robustness of the theory in light of varying assumptions as well as the accuracy of the suggested estimates. We consider the first two channels as the the two tasks and $d=144$. $95$ samples are used for the training and $42$ samples are used for the test.

Theorems & Definitions (10)

  • Definition 1: Concentrated random vector ${\mathbf{x}}_i^{(t)}$
  • Theorem 1: Asymptotic training risk
  • Theorem 2: Asymptotic test risk
  • Lemma 1: Deterministic equivalents for $\tilde{{\mathbf{Q}}}$, $\tilde{{\mathbf{Q}}}{\mathbf{M}}\tilde{{\mathbf{Q}}}$ and ${\mathbf{Q}}^2$ for any ${\mathbf{M}}\in \mathbb{R}^{n\times n}$
  • Theorem 3: louart2021spectral, Theorem 0.9.
  • Lemma 2: louart2021spectral, Lemmas 4.2, 4.6
  • Theorem 4
  • proof
  • Theorem 5
  • proof