Table of Contents
Fetching ...

Feature-aligned N-BEATS with Sinkhorn divergence

Joonhun Lee, Myeongho Jeon, Myungjoo Kang, Kyunghyun Park

TL;DR

The paper tackles domain shift in time series forecasting by extending N-BEATS with stack-wise feature alignment guided by the Sinkhorn divergence. It defines stack-wise marginal feature measures via pushforwards and a normalization to mitigate scale effects, enabling invariant representations across $K$ source domains. A representation-learning bound ties the stack-wise alignment loss based on the debiased Sinkhorn divergence $\\widehat{\\mathcal{W}}_{\\epsilon,\\widetilde{\\mathcal{Z}}}$ to generalization across domains. Experiments on real-world data demonstrate improved generalization under out-domain, cross-domain, and in-domain shifts while preserving interpretability.

Abstract

We propose Feature-aligned N-BEATS as a domain-generalized time series forecasting model. It is a nontrivial extension of N-BEATS with doubly residual stacking principle (Oreshkin et al. [45]) into a representation learning framework. In particular, it revolves around marginal feature probability measures induced by the intricate composition of residual and feature extracting operators of N-BEATS in each stack and aligns them stack-wise via an approximate of an optimal transport distance referred to as the Sinkhorn divergence. The training loss consists of an empirical risk minimization from multiple source domains, i.e., forecasting loss, and an alignment loss calculated with the Sinkhorn divergence, which allows the model to learn invariant features stack-wise across multiple source data sequences while retaining N-BEATS's interpretable design and forecasting power. Comprehensive experimental evaluations with ablation studies are provided and the corresponding results demonstrate the proposed model's forecasting and generalization capabilities.

Feature-aligned N-BEATS with Sinkhorn divergence

TL;DR

The paper tackles domain shift in time series forecasting by extending N-BEATS with stack-wise feature alignment guided by the Sinkhorn divergence. It defines stack-wise marginal feature measures via pushforwards and a normalization to mitigate scale effects, enabling invariant representations across source domains. A representation-learning bound ties the stack-wise alignment loss based on the debiased Sinkhorn divergence to generalization across domains. Experiments on real-world data demonstrate improved generalization under out-domain, cross-domain, and in-domain shifts while preserving interpretability.

Abstract

We propose Feature-aligned N-BEATS as a domain-generalized time series forecasting model. It is a nontrivial extension of N-BEATS with doubly residual stacking principle (Oreshkin et al. [45]) into a representation learning framework. In particular, it revolves around marginal feature probability measures induced by the intricate composition of residual and feature extracting operators of N-BEATS in each stack and aligns them stack-wise via an approximate of an optimal transport distance referred to as the Sinkhorn divergence. The training loss consists of an empirical risk minimization from multiple source domains, i.e., forecasting loss, and an alignment loss calculated with the Sinkhorn divergence, which allows the model to learn invariant features stack-wise across multiple source data sequences while retaining N-BEATS's interpretable design and forecasting power. Comprehensive experimental evaluations with ablation studies are provided and the corresponding results demonstrate the proposed model's forecasting and generalization capabilities.
Paper Structure (19 sections, 3 theorems, 35 equations, 8 figures, 11 tables, 1 algorithm)

This paper contains 19 sections, 3 theorems, 35 equations, 8 figures, 11 tables, 1 algorithm.

Key Result

Proposition 2.1

Let $\Delta_K$ be a (K-1)-dimensional simplex such that each component $\pi$ represents a convex weight. Set $\Lambda:=\{\sum_{k=1}^K\pi_i \mathbb{P}^k_{{\cal X}}|\pi\in \Delta_K\}$ and let $\mathbb{P}^*:=\sum_{k=1}^K\pi_k^*\mathbb{P}_{{\cal X}}^k\in \mathop{\mathrm{arg\,min}}\limits_{\mathbb{P}'_{ with $\lambda_{(\mathbb{P}_{{\cal X}}^T,\mathbb{P}^*_{{\cal X}})} := \min\{\mathbb{E}_{x\sim\mathbb

Figures (8)

  • Figure 1: Illustration of Feature-aligned N-BEATS.
  • Figure 2: Visualization on invariant feature learning. In the aligned scenario (w), the interconnection between green and red instances, particularly at $\lambda=3$, becomes visible. Contrastingly, in the non-aligned scenario (w/o), we observe a pronounced dispersion, especially of the blue instances within the initial two stacks at $\lambda=3$, resulting in heightened inter-domain entropy.
  • Figure 3: Illustration of Feature-aligned N-BEATS (noting that it is a detailed version of Figure \ref{['fig:n-beats']}).
  • Figure 4: Visualization of frequency distribution. (a) FRED, and (b) NCEI.
  • Figure 5: Training and validation loss plots. (a) Total loss, (b) forecasting loss, and (c) alignment loss. From top to bottom, each row illustrates the losses of N-BEATS-G, N-BEATS-I, and N-HiTS, respectively. Losses are reported every 10 iterations.
  • ...and 3 more figures

Theorems & Definitions (11)

  • Proposition 2.1
  • Definition 3.1
  • Remark 3.2
  • Remark 3.3
  • Lemma 3.4
  • Definition 3.5
  • Theorem 3.6
  • proof : Proof of Lemma \ref{['lem:lipschitz']}
  • proof : Proof of Theorem \ref{['thm:sinkhorn_div']}
  • Remark C.1
  • ...and 1 more