Table of Contents
Fetching ...

CATS: Mitigating Correlation Shift for Multivariate Time Series Classification

Xiao Lin, Zhichen Zeng, Tianxin Wei, Zhining Liu, Yuzhong chen, Hanghang Tong

TL;DR

A scalable and parameter-efficient adapter for M\underline{TS} (CATS), designed as a plug-and-play technique compatible with various Transformer variants, which employs temporal convolution to capture local temporal patterns and a graph attention module to model the changing multivariate correlation.

Abstract

Unsupervised Domain Adaptation (UDA) leverages labeled source data to train models for unlabeled target data. Given the prevalence of multivariate time series (MTS) data across various domains, the UDA task for MTS classification has emerged as a critical challenge. However, for MTS data, correlations between variables often vary across domains, whereas most existing UDA works for MTS classification have overlooked this essential characteristic. To bridge this gap, we introduce a novel domain shift, {\em correlation shift}, measuring domain differences in multivariate correlation. To mitigate correlation shift, we propose a scalable and parameter-efficient \underline{C}orrelation \underline{A}dapter for M\underline{TS} (CATS). Designed as a plug-and-play technique compatible with various Transformer variants, CATS employs temporal convolution to capture local temporal patterns and a graph attention module to model the changing multivariate correlation. The adapter reweights the target correlations to align the source correlations with a theoretically guaranteed precision. A correlation alignment loss is further proposed to mitigate correlation shift, bypassing the alignment challenge from the non-i.i.d. nature of MTS data. Extensive experiments on four real-world datasets demonstrate that (1) compared with vanilla Transformer-based models, CATS increases over $10\%$ average accuracy while only adding around $1\%$ parameters, and (2) all Transformer variants equipped with CATS either reach or surpass state-of-the-art baselines.

CATS: Mitigating Correlation Shift for Multivariate Time Series Classification

TL;DR

A scalable and parameter-efficient adapter for M\underline{TS} (CATS), designed as a plug-and-play technique compatible with various Transformer variants, which employs temporal convolution to capture local temporal patterns and a graph attention module to model the changing multivariate correlation.

Abstract

Unsupervised Domain Adaptation (UDA) leverages labeled source data to train models for unlabeled target data. Given the prevalence of multivariate time series (MTS) data across various domains, the UDA task for MTS classification has emerged as a critical challenge. However, for MTS data, correlations between variables often vary across domains, whereas most existing UDA works for MTS classification have overlooked this essential characteristic. To bridge this gap, we introduce a novel domain shift, {\em correlation shift}, measuring domain differences in multivariate correlation. To mitigate correlation shift, we propose a scalable and parameter-efficient \underline{C}orrelation \underline{A}dapter for M\underline{TS} (CATS). Designed as a plug-and-play technique compatible with various Transformer variants, CATS employs temporal convolution to capture local temporal patterns and a graph attention module to model the changing multivariate correlation. The adapter reweights the target correlations to align the source correlations with a theoretically guaranteed precision. A correlation alignment loss is further proposed to mitigate correlation shift, bypassing the alignment challenge from the non-i.i.d. nature of MTS data. Extensive experiments on four real-world datasets demonstrate that (1) compared with vanilla Transformer-based models, CATS increases over average accuracy while only adding around parameters, and (2) all Transformer variants equipped with CATS either reach or surpass state-of-the-art baselines.

Paper Structure

This paper contains 29 sections, 7 theorems, 32 equations, 5 figures, 3 tables.

Key Result

Proposition 1

Suppose source data $\mathbf{X}_s\in\mathbb{R}^{D\times T}$ and target data $\mathbf{X}_t\in\mathbb{R}^{D\times T}$ follow $\mathcal{N}_s(\bm{\mu}_s,\bm{\Sigma}_s)$ and $\mathcal{N}_t(\bm{\mu}_t,\bm{\Sigma}_t)$, respectively. There exists a reweighting matrix $\mathbf{A}\in\mathbb{R}^{D\times D}$ an where $\mathbf{Y} = \mathbf{A} \mathbf{X}_t + \mathbf{b}$ and $\mathbf{b} = \mathbf{0}$ for most MT

Figures (5)

  • Figure 1: Rates of target domains with correlation shifts per source domain. The x-axis represents the source domain index while the y-axis indicates the rate of correlation shifts among the rest 29 domains. The red line marks the average rate of 78%.
  • Figure 2: The main framework of CATS. CATS is integrated after each attention block of any Transformer variant, with only CATS trained and the backbone frozen. The training objective involves three loss functions: (1) classification loss on the labeled source domain, (2) forecasting loss on the unlabeled target domain, and (3) layer-wise correlation alignment loss to align these two domains.
  • Figure 3: The accuracy comparison on HAR dataset between the typical adapter and the TDC-based adapter. With the backbone (TimesNet) pretrained on the domain 1, both adapters are trained on the domain 10.
  • Figure 4: The parameter (FLOP) curve of CATS on Transformer with varying variable number $D$ and time length $T$. The three red curves represent the parameter counts (or FLOPs) of the full Transformer model, while the three blue curves represent CATS alone. In Figure (c), the three red curves overlap due to their relatively small differences compared to the large overall values.
  • Figure 5: Step-by-Step accuracy improvement on HAR dataset from vanilla Transformer to Transformer enhanced by CATS.

Theorems & Definitions (12)

  • Definition 1: Correlation shift
  • Proposition 1: Gaussian Probability Alignment
  • Proposition 2: Correlation Alignment
  • Theorem 1: Attention Approximation
  • Proposition 2: Correlation Alignment
  • proof
  • Proposition 2: Gaussian Probability Alignment
  • proof
  • Theorem 1: Attention Approximation
  • proof
  • ...and 2 more