LaSTR: Language-Driven Time-Series Segment Retrieval

Kota Dohi; Harsh Purohit; Tomoya Nishida; Takashi Endo; Yusuke Ohtsubo; Koichiro Yawata; Koki Takeshita; Tatsuya Sasaki; Yohei Kawaguchi

LaSTR: Language-Driven Time-Series Segment Retrieval

Kota Dohi, Harsh Purohit, Tomoya Nishida, Takashi Endo, Yusuke Ohtsubo, Koichiro Yawata, Koki Takeshita, Tatsuya Sasaki, Yohei Kawaguchi

TL;DR

Across all settings, LaSTR outperforms random and CLIP baselines, yielding improved ranking quality and stronger semantic agreement between retrieved segments and query intent.

Abstract

Effectively searching time-series data is essential for system analysis, but existing methods often require expert-designed similarity criteria or rely on global, series-level descriptions. We study language-driven segment retrieval: given a natural language query, the goal is to retrieve relevant local segments from large time-series repositories. We build large-scale segment--caption training data by applying TV2-based segmentation to LOTSA windows and generating segment descriptions with GPT-5.2, and then train a Conformer-based contrastive retriever in a shared text--time-series embedding space. On a held-out test split, we evaluate single-positive retrieval together with caption-side consistency (SBERT and VLM-as-a-judge) under multiple candidate pool sizes. Across all settings, LaSTR outperforms random and CLIP baselines, yielding improved ranking quality and stronger semantic agreement between retrieved segments and query intent.

LaSTR: Language-Driven Time-Series Segment Retrieval

TL;DR

Across all settings, LaSTR outperforms random and CLIP baselines, yielding improved ranking quality and stronger semantic agreement between retrieved segments and query intent.

Abstract

Paper Structure (16 sections, 19 equations, 3 figures, 2 tables)

This paper contains 16 sections, 19 equations, 3 figures, 2 tables.

Introduction
Relation to prior work
Problem Statement
Method
Large-Scale Segment--Caption Pair Generation
Data and preprocessing.
Segment generation.
Caption generation with VLM.
Segment-Level Contrastive Learning with Conformer
Retrieval at Test Time
Experiments
Dataset
Experimental conditions
Evaluation
Results
...and 1 more sections

Figures (3)

Figure 1: Example VLM input and corresponding outputs for segment captioning. The four captions are generated in the same order as the segment indices shown in the plot.
Figure 2: Distribution of per-query mean VLM scores over top-10 retrieved segments (test split, pool size $=10000$, 5-point scale).
Figure 3: Qualitative retrieval examples on the test split. For each query, the figure shows the query caption and the corresponding rank-1 retrieved time-series window at pool size 10,000. Segment boundaries are overlaid, and the retrieved segment is highlighted to indicate the matched region.

LaSTR: Language-Driven Time-Series Segment Retrieval

TL;DR

Abstract

LaSTR: Language-Driven Time-Series Segment Retrieval

Authors

TL;DR

Abstract

Table of Contents

Figures (3)