Table of Contents
Fetching ...

LaSTR: Language-Driven Time-Series Segment Retrieval

Kota Dohi, Harsh Purohit, Tomoya Nishida, Takashi Endo, Yusuke Ohtsubo, Koichiro Yawata, Koki Takeshita, Tatsuya Sasaki, Yohei Kawaguchi

TL;DR

Across all settings, LaSTR outperforms random and CLIP baselines, yielding improved ranking quality and stronger semantic agreement between retrieved segments and query intent.

Abstract

Effectively searching time-series data is essential for system analysis, but existing methods often require expert-designed similarity criteria or rely on global, series-level descriptions. We study language-driven segment retrieval: given a natural language query, the goal is to retrieve relevant local segments from large time-series repositories. We build large-scale segment--caption training data by applying TV2-based segmentation to LOTSA windows and generating segment descriptions with GPT-5.2, and then train a Conformer-based contrastive retriever in a shared text--time-series embedding space. On a held-out test split, we evaluate single-positive retrieval together with caption-side consistency (SBERT and VLM-as-a-judge) under multiple candidate pool sizes. Across all settings, LaSTR outperforms random and CLIP baselines, yielding improved ranking quality and stronger semantic agreement between retrieved segments and query intent.

LaSTR: Language-Driven Time-Series Segment Retrieval

TL;DR

Across all settings, LaSTR outperforms random and CLIP baselines, yielding improved ranking quality and stronger semantic agreement between retrieved segments and query intent.

Abstract

Effectively searching time-series data is essential for system analysis, but existing methods often require expert-designed similarity criteria or rely on global, series-level descriptions. We study language-driven segment retrieval: given a natural language query, the goal is to retrieve relevant local segments from large time-series repositories. We build large-scale segment--caption training data by applying TV2-based segmentation to LOTSA windows and generating segment descriptions with GPT-5.2, and then train a Conformer-based contrastive retriever in a shared text--time-series embedding space. On a held-out test split, we evaluate single-positive retrieval together with caption-side consistency (SBERT and VLM-as-a-judge) under multiple candidate pool sizes. Across all settings, LaSTR outperforms random and CLIP baselines, yielding improved ranking quality and stronger semantic agreement between retrieved segments and query intent.
Paper Structure (16 sections, 19 equations, 3 figures, 2 tables)

This paper contains 16 sections, 19 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Example VLM input and corresponding outputs for segment captioning. The four captions are generated in the same order as the segment indices shown in the plot.
  • Figure 2: Distribution of per-query mean VLM scores over top-10 retrieved segments (test split, pool size $=10000$, 5-point scale).
  • Figure 3: Qualitative retrieval examples on the test split. For each query, the figure shows the query caption and the corresponding rank-1 retrieved time-series window at pool size 10,000. Segment boundaries are overlaid, and the retrieved segment is highlighted to indicate the matched region.