Table of Contents
Fetching ...

Blending Low and High-Level Semantics of Time Series for Better Masked Time Series Generation

Johan Vik Mathisen, Erlend Lokna, Daesoo Lee, Erlend Aune

TL;DR

NC-VQVAE integrates non-contrastive self-supervised learning into a VQVAE-based time series generation framework to embed both low-level shapes and high-level dynamics into discrete latent tokens. Stage 1 adds an SSL-driven tokenization branch, while Stage 2 uses the learned embeddings for the prior model, enhancing both reconstruction and downstream generation quality. Across a subset of the UCR time series collection, NC-VQVAE improves latent representations, yields higher IS and lower FID on many datasets, and demonstrates more structured, class-discriminative latent spaces. The approach leads to better mode coverage and more faithful, diverse synthetic samples, showing practical promise for high-fidelity TSG with SSL-guided tokenization.

Abstract

State-of-the-art approaches in time series generation (TSG), such as TimeVQVAE, utilize vector quantization-based tokenization to effectively model complex distributions of time series. These approaches first learn to transform time series into a sequence of discrete latent vectors, and then a prior model is learned to model the sequence. The discrete latent vectors, however, only capture low-level semantics (\textit{e.g.,} shapes). We hypothesize that higher-fidelity time series can be generated by training a prior model on more informative discrete latent vectors that contain both low and high-level semantics (\textit{e.g.,} characteristic dynamics). In this paper, we introduce a novel framework, termed NC-VQVAE, to integrate self-supervised learning into those TSG methods to derive a discrete latent space where low and high-level semantics are captured. Our experimental results demonstrate that NC-VQVAE results in a considerable improvement in the quality of synthetic samples.

Blending Low and High-Level Semantics of Time Series for Better Masked Time Series Generation

TL;DR

NC-VQVAE integrates non-contrastive self-supervised learning into a VQVAE-based time series generation framework to embed both low-level shapes and high-level dynamics into discrete latent tokens. Stage 1 adds an SSL-driven tokenization branch, while Stage 2 uses the learned embeddings for the prior model, enhancing both reconstruction and downstream generation quality. Across a subset of the UCR time series collection, NC-VQVAE improves latent representations, yields higher IS and lower FID on many datasets, and demonstrates more structured, class-discriminative latent spaces. The approach leads to better mode coverage and more faithful, diverse synthetic samples, showing practical promise for high-fidelity TSG with SSL-guided tokenization.

Abstract

State-of-the-art approaches in time series generation (TSG), such as TimeVQVAE, utilize vector quantization-based tokenization to effectively model complex distributions of time series. These approaches first learn to transform time series into a sequence of discrete latent vectors, and then a prior model is learned to model the sequence. The discrete latent vectors, however, only capture low-level semantics (\textit{e.g.,} shapes). We hypothesize that higher-fidelity time series can be generated by training a prior model on more informative discrete latent vectors that contain both low and high-level semantics (\textit{e.g.,} characteristic dynamics). In this paper, we introduce a novel framework, termed NC-VQVAE, to integrate self-supervised learning into those TSG methods to derive a discrete latent space where low and high-level semantics are captured. Our experimental results demonstrate that NC-VQVAE results in a considerable improvement in the quality of synthetic samples.
Paper Structure (31 sections, 5 equations, 14 figures, 3 tables)

This paper contains 31 sections, 5 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Overview of proposed model: NC-VQVAE.
  • Figure 2: Mean validation reconstruction loss of the models with SSL compared to naive VQVAE.
  • Figure 3: t-SNE plot of discrete latent representations from Barlow Twins, VIbCReg, and Naive VQVAE across three datasets: UWaveGestureLibraryAll, TwoPatterns, and FordA. The different colors represent different classes.
  • Figure 4: Class conditional distribution for some selected classes of Mallat, in addition to unconditional samples. Barlow and VIbCReg both trained with Gaussian augmentation.
  • Figure 6: Class conditional distribution for selected classes of ShapesAll. Barlow Twins and VIbCReg are both trained with Gaussian augmentation.
  • ...and 9 more figures