Table of Contents
Fetching ...

Cross-Learning from Scarce Data via Multi-Task Constrained Optimization

Leopoldo Agorio, Juan Cerviño, Miguel Calvo-Fullana, Alejandro Ribeiro, Juan Andrés Bazerque

TL;DR

The paper targets data-scarce supervised learning by proposing cross-learning, a constrained multi-task framework that jointly estimates task-specific parameters while enforcing controlled similarity across tasks under a deterministic-parameter regime. It introduces both parametric-constraint and coupled-outputs formulations, and provides theoretical guarantees (under Gaussian data) that there exists a centrality level $\epsilon$ yielding lower mean-squared error than fully separate or fully shared models. To solve these problems, the authors develop ADMM-based and primal-dual algorithms, and validate the approach on real data: COVID-19 SIR model fitting across countries and Office-Home image classification, where cross-learning improves peak prediction accuracy and classification performance relative to baselines. The work demonstrates that sharing information across related tasks can reduce data requirements and improve reliability, with broad applicability to domains like epidemiology and computer vision, while allowing task-specificity to be preserved through tunable similarity constraints.

Abstract

A learning task, understood as the problem of fitting a parametric model from supervised data, fundamentally requires the dataset to be large enough to be representative of the underlying distribution of the source. When data is limited, the learned models fail generalize to cases not seen during training. This paper introduces a multi-task \emph{cross-learning} framework to overcome data scarcity by jointly estimating \emph{deterministic} parameters across multiple, related tasks. We formulate this joint estimation as a constrained optimization problem, where the constraints dictate the resulting similarity between the parameters of the different models, allowing the estimated parameters to differ across tasks while still combining information from multiple data sources. This framework enables knowledge transfer from tasks with abundant data to those with scarce data, leading to more accurate and reliable parameter estimates, providing a solution for scenarios where parameter inference from limited data is critical. We provide theoretical guarantees in a controlled framework with Gaussian data, and show the efficiency of our cross-learning method in applications with real data including image classification and propagation of infectious diseases.

Cross-Learning from Scarce Data via Multi-Task Constrained Optimization

TL;DR

The paper targets data-scarce supervised learning by proposing cross-learning, a constrained multi-task framework that jointly estimates task-specific parameters while enforcing controlled similarity across tasks under a deterministic-parameter regime. It introduces both parametric-constraint and coupled-outputs formulations, and provides theoretical guarantees (under Gaussian data) that there exists a centrality level yielding lower mean-squared error than fully separate or fully shared models. To solve these problems, the authors develop ADMM-based and primal-dual algorithms, and validate the approach on real data: COVID-19 SIR model fitting across countries and Office-Home image classification, where cross-learning improves peak prediction accuracy and classification performance relative to baselines. The work demonstrates that sharing information across related tasks can reduce data requirements and improve reliability, with broad applicability to domains like epidemiology and computer vision, while allowing task-specificity to be preserved through tunable similarity constraints.

Abstract

A learning task, understood as the problem of fitting a parametric model from supervised data, fundamentally requires the dataset to be large enough to be representative of the underlying distribution of the source. When data is limited, the learned models fail generalize to cases not seen during training. This paper introduces a multi-task \emph{cross-learning} framework to overcome data scarcity by jointly estimating \emph{deterministic} parameters across multiple, related tasks. We formulate this joint estimation as a constrained optimization problem, where the constraints dictate the resulting similarity between the parameters of the different models, allowing the estimated parameters to differ across tasks while still combining information from multiple data sources. This framework enables knowledge transfer from tasks with abundant data to those with scarce data, leading to more accurate and reliable parameter estimates, providing a solution for scenarios where parameter inference from limited data is critical. We provide theoretical guarantees in a controlled framework with Gaussian data, and show the efficiency of our cross-learning method in applications with real data including image classification and propagation of infectious diseases.

Paper Structure

This paper contains 14 sections, 10 theorems, 68 equations, 11 figures, 2 algorithms.

Key Result

Proposition 1

The cross-learning estimator eqn_perf_ana:cross_learning, for an $\epsilon$ approaching zero, achieves a strictly lower mean squared error than the consensus estimator eqn:hatP_centralized. That is

Figures (11)

  • Figure 1: Intuitive interpretation of proof and the symmetry of $\hat{u}_t=(\hat{\theta}_t-\hat{\theta}_{c})/\|\hat{\theta}_t-\hat{\theta}_{c}\|$ on step \ref{['eq:ptofs']} of the proof of Proposition \ref{['prop:centralized_vs_cl']}. The vector $\hat{\theta}_t-\hat{\theta}_c$ admits two representations, one as $\theta_t^\star-\theta_c^\star+v_t$ and a second one as $r_t u_t$, with $r_t= \|\hat{\theta}_t-\hat{\theta}_c\|$, which leads to the change of variables $v_t=r_t u_t-\theta_t^\star-\theta_c^\star$. Furthermore, vectors $\hat{u}_t$ come in pairs. That is, for each blue point representing the vector $\hat{u}_t$, there is a purple mirror image across $\theta_t^\star-\theta_c^\star$ such that its inner product with $\theta_t^\star-\theta_c^\star$ has same magnitude but opposite sign.
  • Figure 2: Geometric argument for the proof of Proposition \ref{['prop:independent_vs_cl_estimator']}. The generators $\theta_t^\star$ are inside the ball $\mathcal{B}(\theta_g^\dag, \epsilon)$. The separate estimates $\hat{\theta}_t$ are projected into $\mathcal{B}(\theta_g^\dag, \epsilon)$ resulting in the cross-learning estimator $\theta_t^\dag$. Points $\hat{\theta}_t$ in the red region will not be changed, so that $\hat{\theta}_t=\theta_t^\dag$, and points in the blue region will be projected to the surface where the error to $\theta_t^\star$ will be lower.
  • Figure 3: Mean square error of the cross-learning estimate as a function of $\epsilon$. The value of $\epsilon$ interpolates between consensus ($\epsilon=0$), and separate estimation ($\epsilon=\infty$). We present the case with small variance ($\sigma=1$) wherein the separable estimator outperforms the consensus one, and the case with large variance ($\sigma=2$) in which consensus is better than separate estimation. In either case, the plots exhibit a value of $\epsilon\in(0,\infty)$ for which cross-learning outperforms both the consensus and separable estimators.
  • Figure 4: Infected population over time for $T=14$ different countries in the OWID dataset. Countries exhibit different dynamics for the evolution of their infected populations.
  • Figure 5: SIR model predictions for ARG using the separate $(\beta=0.1481,\gamma=0.0)$, cross-learning $(\beta=0.6608,\gamma=0.4388)$, and consensus $(\beta=0.3497,\gamma=0.2802)$ estimators. Filled and hollow black dots represent the training and test datasets, respectively. Cross-learning achieves a more accurate prediction of the peak of infections with an error of $0.07\%$ in the number of cases and finding the exact day when the peak occurs, as compared to errors of $2474\%$ and $76.38\%$ and time lags of $108$ and $8$ days for the separate and consensus estimators, respectively.
  • ...and 6 more figures

Theorems & Definitions (20)

  • Example 1
  • Proposition 1
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Theorem 1
  • proof
  • ...and 10 more