Table of Contents
Fetching ...

ReFine: Boosting Time Series Prediction of Extreme Events by Reweighting and Fine-tuning

Jimeng Shi, Azam Shirali, Giri Narasimhan

TL;DR

Meta-learning-based reweighting outperforms existing heuristic ones, and the fine-tuning strategy can further increase the model performance, which can be implemented on any type of neural network for time series forecasting.

Abstract

Extreme events are of great importance since they often represent impactive occurrences. For instance, in terms of climate and weather, extreme events might be major storms, floods, extreme heat or cold waves, and more. However, they are often located at the tail of the data distribution. Consequently, accurately predicting these extreme events is challenging due to their rarity and irregularity. Prior studies have also referred to this as the out-of-distribution (OOD) problem, which occurs when the distribution of the test data is substantially different from that used for training. In this work, we propose two strategies, reweighting and fine-tuning, to tackle the challenge. Reweighting is a strategy used to force machine learning models to focus on extreme events, which is achieved by a weighted loss function that assigns greater penalties to the prediction errors for the extreme samples relative to those on the remainder of the data. Unlike previous intuitive reweighting methods based on simple heuristics of data distribution, we employ meta-learning to dynamically optimize these penalty weights. To further boost the performance on extreme samples, we start from the reweighted models and fine-tune them using only rare extreme samples. Through extensive experiments on multiple data sets, we empirically validate that our meta-learning-based reweighting outperforms existing heuristic ones, and the fine-tuning strategy can further increase the model performance. More importantly, these two strategies are model-agnostic, which can be implemented on any type of neural network for time series forecasting. The open-sourced code is available at \url{https://github.com/JimengShi/ReFine}.

ReFine: Boosting Time Series Prediction of Extreme Events by Reweighting and Fine-tuning

TL;DR

Meta-learning-based reweighting outperforms existing heuristic ones, and the fine-tuning strategy can further increase the model performance, which can be implemented on any type of neural network for time series forecasting.

Abstract

Extreme events are of great importance since they often represent impactive occurrences. For instance, in terms of climate and weather, extreme events might be major storms, floods, extreme heat or cold waves, and more. However, they are often located at the tail of the data distribution. Consequently, accurately predicting these extreme events is challenging due to their rarity and irregularity. Prior studies have also referred to this as the out-of-distribution (OOD) problem, which occurs when the distribution of the test data is substantially different from that used for training. In this work, we propose two strategies, reweighting and fine-tuning, to tackle the challenge. Reweighting is a strategy used to force machine learning models to focus on extreme events, which is achieved by a weighted loss function that assigns greater penalties to the prediction errors for the extreme samples relative to those on the remainder of the data. Unlike previous intuitive reweighting methods based on simple heuristics of data distribution, we employ meta-learning to dynamically optimize these penalty weights. To further boost the performance on extreme samples, we start from the reweighted models and fine-tune them using only rare extreme samples. Through extensive experiments on multiple data sets, we empirically validate that our meta-learning-based reweighting outperforms existing heuristic ones, and the fine-tuning strategy can further increase the model performance. More importantly, these two strategies are model-agnostic, which can be implemented on any type of neural network for time series forecasting. The open-sourced code is available at \url{https://github.com/JimengShi/ReFine}.
Paper Structure (24 sections, 1 theorem, 19 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 24 sections, 1 theorem, 19 equations, 8 figures, 4 tables, 1 algorithm.

Key Result

Lemma 1

Suppose the evaluation loss function is Lipschitz-smooth with constant $L$, and the train loss function $\ell_i$ of training data $x_i$ has $\sigma$-bounded gradients. Let the learning rate $\phi$ satisfy $\phi \leq \frac{2n}{L \sigma^2}$, where $n$ is the training batch size. Following our algorith where $\mathcal{L}(\theta)$ is the total evaluation loss The equality $\mathcal{L}(\theta_{t+1}) =

Figures (8)

  • Figure 1: Out-of-Distribution (OOD) problem showing the disparity between the training and test sets. The gray dashed line represents the threshold (95$^{th}$ percentile) to separate normal and extreme samples.
  • Figure 2: Illustration of data distribution. $\Phi$ and $\Phi'$ are the probability distribution functions of the training and evaluation set. The dashed line refers to the threshold to split extreme and normal samples. The oval sizes represent the set sizes.
  • Figure 3: Training process of the unweighted framework, the reweighting approach, and the fine-tuning method. The ovals represent the sample spaces; the small circles represent individual inputs, while their sizes denote their weights; $\textbf{y}_i$ and $\hat{\textbf{y}}_i$ are the ground truth and prediction values, respectively. Trainable models are marked with the "fire" symbol in the upper left corner; individual layers are marked as trainable or frozen during fine-tuning.
  • Figure 4: A schematic of the meta-learning-based reweighting method.
  • Figure 5: Visualization of truth and prediction. "Unweighted" is the baseline model without reweighting and fine-tuning, while the last three are with reweighting and fine-tuning.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Definition 3.1: Time Series Prediction
  • Definition 3.2: Extreme Events
  • Definition 3.3: Long-tailed Distributions
  • Definition 4.1: $\sigma$-bounded gradients garrigos2023handbook
  • Lemma 1
  • proof