Blending Low and High-Level Semantics of Time Series for Better Masked Time Series Generation
Johan Vik Mathisen, Erlend Lokna, Daesoo Lee, Erlend Aune
TL;DR
NC-VQVAE integrates non-contrastive self-supervised learning into a VQVAE-based time series generation framework to embed both low-level shapes and high-level dynamics into discrete latent tokens. Stage 1 adds an SSL-driven tokenization branch, while Stage 2 uses the learned embeddings for the prior model, enhancing both reconstruction and downstream generation quality. Across a subset of the UCR time series collection, NC-VQVAE improves latent representations, yields higher IS and lower FID on many datasets, and demonstrates more structured, class-discriminative latent spaces. The approach leads to better mode coverage and more faithful, diverse synthetic samples, showing practical promise for high-fidelity TSG with SSL-guided tokenization.
Abstract
State-of-the-art approaches in time series generation (TSG), such as TimeVQVAE, utilize vector quantization-based tokenization to effectively model complex distributions of time series. These approaches first learn to transform time series into a sequence of discrete latent vectors, and then a prior model is learned to model the sequence. The discrete latent vectors, however, only capture low-level semantics (\textit{e.g.,} shapes). We hypothesize that higher-fidelity time series can be generated by training a prior model on more informative discrete latent vectors that contain both low and high-level semantics (\textit{e.g.,} characteristic dynamics). In this paper, we introduce a novel framework, termed NC-VQVAE, to integrate self-supervised learning into those TSG methods to derive a discrete latent space where low and high-level semantics are captured. Our experimental results demonstrate that NC-VQVAE results in a considerable improvement in the quality of synthetic samples.
