Table of Contents
Fetching ...

AdaPRL: Adaptive Pairwise Regression Learning with Uncertainty Estimation for Universal Regression Tasks

Fuhang Liang, Rucong Xu, Deng Lin

TL;DR

AdaPRL tackles suboptimal regression from ignoring inter-sample relations and being sensitive to aleatoric uncertainty. It combines adaptive pairwise regression losses with a deep probabilistic auxiliary network to estimate per-sample uncertainty, weighting informative pairs via a confidence matrix and yielding a final loss $L_{AdaPRL}=L_{reg}+lpha L_{CPRL}$. The approach extends naturally to multi-task and multivariate time series settings through task-wise pairwise losses and sparse confidence variants, with empirical gains across eight real-world datasets and a large-scale online A/B test showing revenue uplift. The results demonstrate improved accuracy, ranking, robustness to noise and data scarcity, and enhanced interpretability through uncertainty quantification, while preserving inference-time efficiency.

Abstract

Current deep regression models usually learn in a point-wise way that treats each sample as an independent input, neglecting the relative ordering among different data. Consequently, the regression model could neglect the data's interrelationships, potentially resulting in suboptimal performance. Moreover, the existence of aleatoric uncertainty in the training data may drive the model to capture non-generalizable patterns, contributing to increased overfitting. To address these issues, we propose a novel adaptive pairwise learning framework for regression tasks (AdaPRL) which leverages the relative differences between data points and integrates with deep probabilistic models to quantify the uncertainty associated with the predictions. Additionally, we adapt AdaPRL for applications in multi-task learning and multivariate time series forecasting. Extensive experiments with several real-world regression datasets including recommendation systems, age prediction, time series forecasting, natural language understanding, finance, and industry datasets show that AdaPRL is compatible with different backbone networks in various tasks and achieves state-of-the-art performance on the vast majority of tasks without extra inference cost, highlighting its notable potential including enhancing prediction accuracy and ranking ability, increasing generalization capability, improving robustness to noisy data, improving resilience to reduced data, and enhancing interpretability. Experiments also show that AdaPRL can be seamlessly incorporated into recently proposed regression frameworks to gain performance improvement.

AdaPRL: Adaptive Pairwise Regression Learning with Uncertainty Estimation for Universal Regression Tasks

TL;DR

AdaPRL tackles suboptimal regression from ignoring inter-sample relations and being sensitive to aleatoric uncertainty. It combines adaptive pairwise regression losses with a deep probabilistic auxiliary network to estimate per-sample uncertainty, weighting informative pairs via a confidence matrix and yielding a final loss . The approach extends naturally to multi-task and multivariate time series settings through task-wise pairwise losses and sparse confidence variants, with empirical gains across eight real-world datasets and a large-scale online A/B test showing revenue uplift. The results demonstrate improved accuracy, ranking, robustness to noise and data scarcity, and enhanced interpretability through uncertainty quantification, while preserving inference-time efficiency.

Abstract

Current deep regression models usually learn in a point-wise way that treats each sample as an independent input, neglecting the relative ordering among different data. Consequently, the regression model could neglect the data's interrelationships, potentially resulting in suboptimal performance. Moreover, the existence of aleatoric uncertainty in the training data may drive the model to capture non-generalizable patterns, contributing to increased overfitting. To address these issues, we propose a novel adaptive pairwise learning framework for regression tasks (AdaPRL) which leverages the relative differences between data points and integrates with deep probabilistic models to quantify the uncertainty associated with the predictions. Additionally, we adapt AdaPRL for applications in multi-task learning and multivariate time series forecasting. Extensive experiments with several real-world regression datasets including recommendation systems, age prediction, time series forecasting, natural language understanding, finance, and industry datasets show that AdaPRL is compatible with different backbone networks in various tasks and achieves state-of-the-art performance on the vast majority of tasks without extra inference cost, highlighting its notable potential including enhancing prediction accuracy and ranking ability, increasing generalization capability, improving robustness to noisy data, improving resilience to reduced data, and enhancing interpretability. Experiments also show that AdaPRL can be seamlessly incorporated into recently proposed regression frameworks to gain performance improvement.
Paper Structure (69 sections, 19 equations, 11 figures, 16 tables, 1 algorithm)

This paper contains 69 sections, 19 equations, 11 figures, 16 tables, 1 algorithm.

Figures (11)

  • Figure 1: Overall performance of AdaPRL and comparison with different regression loss on certain metrics with different dataset across multiple domains.
  • Figure 2: Influence of hyperparameter alpha in Movielens-1M and KuaiRec datasets. The upper subplot illustrated the evaluation metrics in Movielens-1M, while the lower one presented the experimental results in KuaiRec.
  • Figure 3: Influence of hyperparameter alpha in the ETT dataset. The upper subplot illustrated the evaluation metrics using iTransformer as the backbone network, while the lower one presented the experimental results using Minusformer as the backbone network.
  • Figure 4: The average performance of our AdaPRL with different levels of sparsity in the ETT dataset subset with a forecasting horizon of 720 time steps.
  • Figure 5: Performance comparison of AdaPRL and L2 loss on MSE metrics with different levels of label noise in training data on the KuaiRec dataset.
  • ...and 6 more figures