Table of Contents
Fetching ...

RDIS: Random Drop Imputation with Self-Training for Incomplete Time Series Data

Tae-Min Choi, Ji-Su Kang, Jong-Hwan Kim

TL;DR

In RDIS, a novel training method for time-series data imputation models, extra missing values are generated by applying a random drop to the observed values in incomplete data by utilizing self-training with pseudo values to exploit the original missing values.

Abstract

Time-series data with missing values are commonly encountered in many fields, such as healthcare, meteorology, and robotics. The imputation aims to fill the missing values with valid values. Most imputation methods trained the models implicitly because missing values have no ground truth. In this paper, we propose Random Drop Imputation with Self-training (RDIS), a novel training method for time-series data imputation models. In RDIS, we generate extra missing values by applying a random drop on the observed values in incomplete data. We can explicitly train the imputation models by filling in the randomly dropped values. In addition, we adopt self-training with pseudo values to exploit the original missing values. To improve the quality of pseudo values, we set the threshold and filter them by calculating the entropy. To verify the effectiveness of RDIS on the time series imputation, we test RDIS to various imputation models and achieve competitive results on two real-world datasets.

RDIS: Random Drop Imputation with Self-Training for Incomplete Time Series Data

TL;DR

In RDIS, a novel training method for time-series data imputation models, extra missing values are generated by applying a random drop to the observed values in incomplete data by utilizing self-training with pseudo values to exploit the original missing values.

Abstract

Time-series data with missing values are commonly encountered in many fields, such as healthcare, meteorology, and robotics. The imputation aims to fill the missing values with valid values. Most imputation methods trained the models implicitly because missing values have no ground truth. In this paper, we propose Random Drop Imputation with Self-training (RDIS), a novel training method for time-series data imputation models. In RDIS, we generate extra missing values by applying a random drop on the observed values in incomplete data. We can explicitly train the imputation models by filling in the randomly dropped values. In addition, we adopt self-training with pseudo values to exploit the original missing values. To improve the quality of pseudo values, we set the threshold and filter them by calculating the entropy. To verify the effectiveness of RDIS on the time series imputation, we test RDIS to various imputation models and achieve competitive results on two real-world datasets.

Paper Structure

This paper contains 19 sections, 7 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: (a) The training procedure of the proposed RDI. The white, blue, and green areas represent missing, observed, and generated values. We generate random drop data represented by a red slashed box by dropping some observed values. Original data can be augmented to $N$ random drop data. Then, we feed each random drop data to each model and calculate $L_{impute}$ using the original data and the output. (b) The training procedure of RDIS. The yellow areas represent pseudo values with enough confidence higher than the threshold. We feed original data with pseudo values and calculate $L_{self}$ using the pseudo values and the output.
  • Figure 2: Overall procedure of determining reliable pseudo values. The blue line indicates the observed values, while the red line indicates generated values. In order to choose reliable pseudo values, the entropy of each missing value from $N$ pre-trained models is first calculated. Then, the values with lower entropy than the threshold are selected as pseudo values.
  • Figure 3: The MSE comparison of GRU and Bi-GRU grafted with None, RDI/E, RDI, and RDIS, respectively, on the air quality dataset and the gas sensor dataset. (a) Air quality (GRU), (b) air quality (Bi-GRU), (c) gas sensor (GRU), (d) gas sensor (Bi-GRU).
  • Figure 4: Pseudo value and update epoch analyses. (a) Pseudo value's accuracy per standard deviation over different missing rates, (b) the ratio of MSE reduction (%) of RDIS based on RDI for different update epochs.