Table of Contents
Fetching ...

Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment

Shunyu Wu, Dan Li, Wenjie Feng, Haozheng Ye, Jian Lou, See-Kiong Ng

TL;DR

This paper proposes TSRating, a novel and unified framework for rating the quality of time series data crawled from diverse domains that leverages LLMs' inherent ample knowledge, acquired during their extensive pretraining, to comprehend and discern quality differences in diverse TS data.

Abstract

High-quality time series (TS) data are essential for ensuring TS model performance, rendering research on rating TS data quality indispensable. Existing methods have shown promising rating accuracy within individual domains, primarily by extending data quality rating techniques such as influence functions and Shapley values to account for temporal characteristics. However, they neglect the fact that real-world TS data can span vastly different domains and exhibit distinct properties, hampering the accurate and efficient rating of diverse TS data. In this paper, we propose TSRating, a novel and unified framework for rating the quality of time series data crawled from diverse domains. TSRating leverages LLMs' inherent ample knowledge, acquired during their extensive pretraining, to comprehend and discern quality differences in diverse TS data. We verify this by devising a series of prompts to elicit quality comparisons from LLMs for pairs of TS samples. We then fit a dedicated rating model, termed TSRater, to convert the LLMs' judgments into efficient quality predictions by inferring future TS samples through TSRater's inference. To ensure cross-domain adaptability, we develop a meta-learning scheme to train TSRater on quality comparisons collected from nine distinct domains. To improve training efficiency, we employ signSGD for inner-loop updates, thus circumventing the demanding computation of hypergradients. Extensive experimental results on eleven benchmark datasets across three time series tasks, each using both conventional TS models and TS foundation models, demonstrate that TSRating outperforms baselines in terms of estimation accuracy, efficiency, and domain adaptability.

Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment

TL;DR

This paper proposes TSRating, a novel and unified framework for rating the quality of time series data crawled from diverse domains that leverages LLMs' inherent ample knowledge, acquired during their extensive pretraining, to comprehend and discern quality differences in diverse TS data.

Abstract

High-quality time series (TS) data are essential for ensuring TS model performance, rendering research on rating TS data quality indispensable. Existing methods have shown promising rating accuracy within individual domains, primarily by extending data quality rating techniques such as influence functions and Shapley values to account for temporal characteristics. However, they neglect the fact that real-world TS data can span vastly different domains and exhibit distinct properties, hampering the accurate and efficient rating of diverse TS data. In this paper, we propose TSRating, a novel and unified framework for rating the quality of time series data crawled from diverse domains. TSRating leverages LLMs' inherent ample knowledge, acquired during their extensive pretraining, to comprehend and discern quality differences in diverse TS data. We verify this by devising a series of prompts to elicit quality comparisons from LLMs for pairs of TS samples. We then fit a dedicated rating model, termed TSRater, to convert the LLMs' judgments into efficient quality predictions by inferring future TS samples through TSRater's inference. To ensure cross-domain adaptability, we develop a meta-learning scheme to train TSRater on quality comparisons collected from nine distinct domains. To improve training efficiency, we employ signSGD for inner-loop updates, thus circumventing the demanding computation of hypergradients. Extensive experimental results on eleven benchmark datasets across three time series tasks, each using both conventional TS models and TS foundation models, demonstrate that TSRating outperforms baselines in terms of estimation accuracy, efficiency, and domain adaptability.

Paper Structure

This paper contains 38 sections, 13 equations, 9 figures, 31 tables, 1 algorithm.

Figures (9)

  • Figure 1: Overview of the proposed TSRating framework for diverse time series quality assessment.
  • Figure 2: Data pruning comparison. Time blocks with the highest ratings are iteratively removed, and the performance degradation is measured. Higher RMSE and MAPE in Left and Middle, and lower accuracy in Right subfigures indicate more accurate rating methods.
  • Figure 3: Mean Squared Error (MSE) on test sets after finetuning different time-series foundation models using varying portions of training data across three datasets (Left: Time-MoE, Middle: Time-LLM, Right: MOMENT). For each dataset, the model is fine-tuned using either the top 50% highest-quality data, the bottom 50%, or the full dataset. Models finetuned on higher-quality subsets consistently achieve lower MSEs, demonstrating the effectiveness of quality-based data selection.
  • Figure 4: Visualization of the top and bottom ranking blocks in the Electricity dataset for each criterion (Trend, Frequency, Amplitude, and Pattern) with TSRating.
  • Figure 5: Visualization of the top and bottom ranking blocks in the Electricity dataset for the combination of four criteria with TSRating.
  • ...and 4 more figures