Table of Contents
Fetching ...

AR-Sieve Bootstrap for the Random Forest and a simulation-based comparison with rangerts time series prediction

Cabrel Teguemne Fokam, Carsten Jentsch, Michel Lang, Markus Pauly

TL;DR

It turns out that ARSB provides more variation amongst the trees in the forest, and RF with ARSB shows greater accuracy compared to RF with other bootstrap strategies, however, these improvements are achieved at some efficiency costs.

Abstract

The Random Forest (RF) algorithm can be applied to a broad spectrum of problems, including time series prediction. However, neither the classical IID (Independent and Identically distributed) bootstrap nor block bootstrapping strategies (as implemented in rangerts) completely account for the nature of the Data Generating Process (DGP) while resampling the observations. We propose the combination of RF with a residual bootstrapping technique where we replace the IID bootstrap with the AR-Sieve Bootstrap (ARSB), which assumes the DGP to be an autoregressive process. To assess the new model's predictive performance, we conduct a simulation study using synthetic data generated from different types of DGPs. It turns out that ARSB provides more variation amongst the trees in the forest. Moreover, RF with ARSB shows greater accuracy compared to RF with other bootstrap strategies. However, these improvements are achieved at some efficiency costs.

AR-Sieve Bootstrap for the Random Forest and a simulation-based comparison with rangerts time series prediction

TL;DR

It turns out that ARSB provides more variation amongst the trees in the forest, and RF with ARSB shows greater accuracy compared to RF with other bootstrap strategies, however, these improvements are achieved at some efficiency costs.

Abstract

The Random Forest (RF) algorithm can be applied to a broad spectrum of problems, including time series prediction. However, neither the classical IID (Independent and Identically distributed) bootstrap nor block bootstrapping strategies (as implemented in rangerts) completely account for the nature of the Data Generating Process (DGP) while resampling the observations. We propose the combination of RF with a residual bootstrapping technique where we replace the IID bootstrap with the AR-Sieve Bootstrap (ARSB), which assumes the DGP to be an autoregressive process. To assess the new model's predictive performance, we conduct a simulation study using synthetic data generated from different types of DGPs. It turns out that ARSB provides more variation amongst the trees in the forest. Moreover, RF with ARSB shows greater accuracy compared to RF with other bootstrap strategies. However, these improvements are achieved at some efficiency costs.
Paper Structure (24 sections, 3 equations, 52 figures, 3 tables)

This paper contains 24 sections, 3 equations, 52 figures, 3 tables.

Figures (52)

  • Figure 1: Moving Block Bootstrap for a time series of length $T = 9$ with $\ell = 2$ as block length. Here, we have $B = 8$ blocks from which $k=4$ blocks are drawn with replacement.
  • Figure 2: left: ARSB generates the new dataset from the original one. The new dataset usually has no common observation with the original dataset. right: the new dataset is created using IID Bootstrap: the first observation has not been sampled. Only the indices [5,3,2,4,2] need to be saved, the bootstrapped samples are also found in the original dataset.
  • Figure 3: Box plots of the Median of MSEs the models across the simulations for the six classes of DGP and for $h=1$ (left) and $h=5$ (right).
  • Figure 4: Box-plots of models' average runtime across the simulations for the six classes of DGP.
  • Figure 5: AR(1): $\phi_1 = 0.2$
  • ...and 47 more figures