Table of Contents
Fetching ...

Disentangled Mode-Specific Representations for Tensor Time Series via Contrastive Learning

Kohei Obata, Taichi Murayama, Zheng Chen, Yasuko Matsubara, Yasushi Sakurai

TL;DR

This paper proposes a novel representation learning method designed specifically for TTS, namely MoST, which uses a tensor slicing approach to reduce the complexity of the TTS structure and learns representations that can be disentangled into individual non-temporal modes.

Abstract

Multi-mode tensor time series (TTS) can be found in many domains, such as search engines and environmental monitoring systems. Learning representations of a TTS benefits various applications, but it is also challenging since the complexities inherent in the tensor hinder the realization of rich representations. In this paper, we propose a novel representation learning method designed specifically for TTS, namely MoST. Specifically, MoST uses a tensor slicing approach to reduce the complexity of the TTS structure and learns representations that can be disentangled into individual non-temporal modes. Each representation captures mode-specific features, which are the relationship between variables within the same mode, and mode-invariant features, which are in common in representations of different modes. We employ a contrastive learning framework to learn parameters; the loss function comprises two parts intended to learn representation in a mode-specific way and mode-invariant way, effectively exploiting disentangled representations as augmentations. Extensive experiments on real-world datasets show that MoST consistently outperforms the state-of-the-art methods in terms of classification and forecasting accuracy. Code is available at https://github.com/KoheiObata/MoST.

Disentangled Mode-Specific Representations for Tensor Time Series via Contrastive Learning

TL;DR

This paper proposes a novel representation learning method designed specifically for TTS, namely MoST, which uses a tensor slicing approach to reduce the complexity of the TTS structure and learns representations that can be disentangled into individual non-temporal modes.

Abstract

Multi-mode tensor time series (TTS) can be found in many domains, such as search engines and environmental monitoring systems. Learning representations of a TTS benefits various applications, but it is also challenging since the complexities inherent in the tensor hinder the realization of rich representations. In this paper, we propose a novel representation learning method designed specifically for TTS, namely MoST. Specifically, MoST uses a tensor slicing approach to reduce the complexity of the TTS structure and learns representations that can be disentangled into individual non-temporal modes. Each representation captures mode-specific features, which are the relationship between variables within the same mode, and mode-invariant features, which are in common in representations of different modes. We employ a contrastive learning framework to learn parameters; the loss function comprises two parts intended to learn representation in a mode-specific way and mode-invariant way, effectively exploiting disentangled representations as augmentations. Extensive experiments on real-world datasets show that MoST consistently outperforms the state-of-the-art methods in terms of classification and forecasting accuracy. Code is available at https://github.com/KoheiObata/MoST.
Paper Structure (25 sections, 6 equations, 3 figures, 3 tables)

This paper contains 25 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Illustrations of a tensor time series and three slices along different modes. (a) A tensor time series with three modes: location, query, and time. (b) Location and query dependencies of a temporal slice. (c), (d) Location and query slices have their own intra-mode dependencies, but temporal dependencies are common.
  • Figure 2: (a) MoST slices the TTS along each non-temporal mode and independently feeds a slice into the slice feature encoder to learn a representation of the slice. Then, the aggregator is applied to summarize mode information. (b) The parameter of the model is learned via contrastive loss, which is composed of mode loss and instance loss. Mode loss utilizes the representations from different sliced tensors, while instance loss utilizes the representations generated by random cropping as contrastive augmentations.
  • Figure 3: t-SNE visualizations of the learned representations. The colors represent the three distinct intra-mode dependencies for (top) mode-1 and (bottom) mode-2. For MoST, representations of mode-1 $V^{(d_1)}$ and mode-2 $V^{(d_2)}$ are shown.

Theorems & Definitions (1)

  • definition 1: Tensor Time Series