Table of Contents
Fetching ...

LogoRA: Local-Global Representation Alignment for Robust Time Series Classification

Huanyu Zhang, Yi-Fan Zhang, Zhang Zhang, Qingsong Wen, Liang Wang

TL;DR

This work addresses unsupervised domain adaptation for time-series classification by learning both global and local representations and aligning them across domains. It introduces LogoRA, a two-branch encoder (global patch-based Transformer and multi-scale local CNN) with a Local-Global Fusion Module and a suite of alignment losses, including DTW-based invariant learning, triplet-margin, domain-adversarial, and per-class prototype center losses. Empirical results on HHAR, WISDM, HAR, and Sleep-EDF show LogoRA consistently outperforming strong baselines, with notable gains such as +12.52% on HHAR and +10.21% on WISDM and an overall average improvement around +6.40%. The approach demonstrates that jointly modeling global context and multi-scale local patterns enhances domain-invariant feature learning for time-series UDA and offers promising avenues for future multimodal extensions.

Abstract

Unsupervised domain adaptation (UDA) of time series aims to teach models to identify consistent patterns across various temporal scenarios, disregarding domain-specific differences, which can maintain their predictive accuracy and effectively adapt to new domains. However, existing UDA methods struggle to adequately extract and align both global and local features in time series data. To address this issue, we propose the Local-Global Representation Alignment framework (LogoRA), which employs a two-branch encoder, comprising a multi-scale convolutional branch and a patching transformer branch. The encoder enables the extraction of both local and global representations from time series. A fusion module is then introduced to integrate these representations, enhancing domain-invariant feature alignment from multi-scale perspectives. To achieve effective alignment, LogoRA employs strategies like invariant feature learning on the source domain, utilizing triplet loss for fine alignment and dynamic time warping-based feature alignment. Additionally, it reduces source-target domain gaps through adversarial training and per-class prototype alignment. Our evaluations on four time-series datasets demonstrate that LogoRA outperforms strong baselines by up to $12.52\%$, showcasing its superiority in time series UDA tasks.

LogoRA: Local-Global Representation Alignment for Robust Time Series Classification

TL;DR

This work addresses unsupervised domain adaptation for time-series classification by learning both global and local representations and aligning them across domains. It introduces LogoRA, a two-branch encoder (global patch-based Transformer and multi-scale local CNN) with a Local-Global Fusion Module and a suite of alignment losses, including DTW-based invariant learning, triplet-margin, domain-adversarial, and per-class prototype center losses. Empirical results on HHAR, WISDM, HAR, and Sleep-EDF show LogoRA consistently outperforming strong baselines, with notable gains such as +12.52% on HHAR and +10.21% on WISDM and an overall average improvement around +6.40%. The approach demonstrates that jointly modeling global context and multi-scale local patterns enhances domain-invariant feature learning for time-series UDA and offers promising avenues for future multimodal extensions.

Abstract

Unsupervised domain adaptation (UDA) of time series aims to teach models to identify consistent patterns across various temporal scenarios, disregarding domain-specific differences, which can maintain their predictive accuracy and effectively adapt to new domains. However, existing UDA methods struggle to adequately extract and align both global and local features in time series data. To address this issue, we propose the Local-Global Representation Alignment framework (LogoRA), which employs a two-branch encoder, comprising a multi-scale convolutional branch and a patching transformer branch. The encoder enables the extraction of both local and global representations from time series. A fusion module is then introduced to integrate these representations, enhancing domain-invariant feature alignment from multi-scale perspectives. To achieve effective alignment, LogoRA employs strategies like invariant feature learning on the source domain, utilizing triplet loss for fine alignment and dynamic time warping-based feature alignment. Additionally, it reduces source-target domain gaps through adversarial training and per-class prototype alignment. Our evaluations on four time-series datasets demonstrate that LogoRA outperforms strong baselines by up to , showcasing its superiority in time series UDA tasks.
Paper Structure (19 sections, 9 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 19 sections, 9 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: Model performance comparison on various tasks. RI denotes the relative improvement compared to SOTA.
  • Figure 2: A motivation example, which contains accelerometer data pieces of walking upstairs (upper) and walking downstairs (lower) from HAR dataset.
  • Figure 3: Model Architecture and Training Pipeline of LogoRA. The time series data is processed through a feature extractor, comprising a Global Encoder and a Multi-Scale Local Encoder, to extract local and global representations. Next, these representations are fed into the Fusion Module to obtain the fused representations. We further use different representations for invariant feature learning ($\mathcal{L}_{dtw}$ and $\mathcal{L}_{margin}$) and alignment across the source and target domain ($\mathcal{L}_{domain}$ and $\mathcal{L}_{center}$).
  • Figure 4: (a) Global Encoder: We use a Transformer encoder with a patching operation to obtain fine-grained global representations. (b) Multi-Scale Local Encoder: We use ConvNet with different kernel sizes to acquire multi-scale local representations.
  • Figure 5: Local-Global Fusion Module. We use cross-attention to fuse global and multi-scale local representations. Next, we concatenate all the cross-attentions and sum them to get the final output.
  • ...and 5 more figures