Table of Contents
Fetching ...

Deep sub-ensembles meets quantile regression: uncertainty-aware imputation for time series

Ying Liu, Peng Cui, Wenbo Hu, Richang Hong

TL;DR

<3-5 sentence high-level summary> This paper tackles uncertainty-aware imputation for time series with substantial missing data. It introduces Quantile Sub-Ensembles (QSE), a non-generative framework that ensembles quantile-regression task networks sharing a trunk, integrated into BiLSTM for probabilistic imputation. Across five real-world datasets, QSE delivers strong deterministic accuracy and reliable uncertainty (via CRPS/MAE) with substantially lower computational cost than diffusion-based methods. The approach offers a practical path for robust, real-time imputation in systems where data reliability and efficiency are critical.

Abstract

Real-world time series data often exhibits substantial missing values, posing challenges for advanced analysis. A common approach to addressing this issue is imputation, where the primary challenge lies in determining the appropriate values to fill in. While previous deep learning methods have proven effective for time series imputation, they often produce overconfident imputations, which poses a potentially overlooked risk to the reliability of the intelligent system. Diffusion methods are proficient in estimating probability distributions but face challenges under a high missing rate and are, moreover, computationally expensive due to the nature of the generative model framework. In this paper, we propose Quantile Sub-Ensembles, a novel method that estimates uncertainty with ensembles of quantile-regression-based task networks and incorporate Quantile Sub-Ensembles into a non-generative time series imputation method. Our method not only produces accurate and reliable imputations, but also remains computationally efficient due to its non-generative framework. We conduct extensive experiments on five real-world datasets, and the results demonstrates superior performance in both deterministic and probabilistic imputation compared to baselines across most experimental settings. The code is available at https://github.com/yingliu-coder/QSE.

Deep sub-ensembles meets quantile regression: uncertainty-aware imputation for time series

TL;DR

<3-5 sentence high-level summary> This paper tackles uncertainty-aware imputation for time series with substantial missing data. It introduces Quantile Sub-Ensembles (QSE), a non-generative framework that ensembles quantile-regression task networks sharing a trunk, integrated into BiLSTM for probabilistic imputation. Across five real-world datasets, QSE delivers strong deterministic accuracy and reliable uncertainty (via CRPS/MAE) with substantially lower computational cost than diffusion-based methods. The approach offers a practical path for robust, real-time imputation in systems where data reliability and efficiency are critical.

Abstract

Real-world time series data often exhibits substantial missing values, posing challenges for advanced analysis. A common approach to addressing this issue is imputation, where the primary challenge lies in determining the appropriate values to fill in. While previous deep learning methods have proven effective for time series imputation, they often produce overconfident imputations, which poses a potentially overlooked risk to the reliability of the intelligent system. Diffusion methods are proficient in estimating probability distributions but face challenges under a high missing rate and are, moreover, computationally expensive due to the nature of the generative model framework. In this paper, we propose Quantile Sub-Ensembles, a novel method that estimates uncertainty with ensembles of quantile-regression-based task networks and incorporate Quantile Sub-Ensembles into a non-generative time series imputation method. Our method not only produces accurate and reliable imputations, but also remains computationally efficient due to its non-generative framework. We conduct extensive experiments on five real-world datasets, and the results demonstrates superior performance in both deterministic and probabilistic imputation compared to baselines across most experimental settings. The code is available at https://github.com/yingliu-coder/QSE.
Paper Structure (19 sections, 16 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 19 sections, 16 equations, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: A conceptual comparison between Deep Ensembles and Deep Sub-Ensembles with $N$ ensemble members is illustrated. In (a), multiple trunk networks, denoted as $\mathcal{F}_i$ are independently trained on sample data $\mathcal{S}_i$ from dataset $\mathcal{D}$. In contrast, (b) shows that a single trunk network, $\mathcal{F}_{share}$, is shared across all ensemble members. $\mathcal{K}_i$ represent task networks with $n$ ensemble members. Both methods involve combining ensemble predictions to generate outputs that reflect uncertainty.
  • Figure 2: The framework of Quantile Sub-Ensembles. We stack the trunk network $\mathcal{F}_{share}$ and $N$ parallel task networks $\{\mathcal{K}_1, \mathcal{K}_2, ..., \mathcal{K}_N \}$ with the same architecture corresponding to $N$ ensemble members. The trunk network $\mathcal{F}_{share}$ functions as a shared feature extractor that learns temporal patterns from the input time series. This design not only reduces redundant parameterization but also improves efficiency. Subsequently, the task networks learn quantiles $\{ q_1, q_2,..., q_N\}$ through randomly initialized parameters $\{\theta_1, \theta_2,...,\theta_N\}$. The imputations of all quantiles will be combined to obtain the reliable result.
  • Figure 3: The overview of the model framework. The multivariate time series $X$ with missing values are fed to the complement layer which fills missing values. Next, the output of the complement layer $x_t^{co}$ and the previous hidden state $h_{t-1}^f, h_{t+1}^b$ is delivered to the feature-based estimation layer to capture the feature relationship. Then we combine $\hat{z}_t$ and the temporal decay factor $\gamma_t$ which gradually diminishes history information over time as the input to $N$ task networks corresponding to $N$ ensemble members. In the meanwhile, quantile regression loss $L$ is computed to update the parameters through the process of backpropagation. Finally, the imputed time series of each ensemble is combined and fed into BiLSTM with cell state $c_{t-1}^f, c_{t+1}^b$.
  • Figure 4: Performance of CRPS and MAE for three sets of quantile levels
  • Figure 5: Performance of MAE and CRPS for the model about the inclusion and exclusion of L1 and L2 losses on three dataset at 50% missing rate.
  • ...and 1 more figures