Table of Contents
Fetching ...

Evaluating Time Series Models with Knowledge Discovery

Li Zhang

TL;DR

The paper addresses the gap where metric-based evaluation of time series models fails to guarantee real-world generalization due to latent factors and evolving environmental conditions, proposing a knowledge-discovery-based evaluation framework that leverages domain knowledge and evidence-seeking explanations. It introduces the distinction between explanation-seeking and evidence-seeking approaches and uses illustrative examples, such as the gravity-like relation $v = g t$ and the GunPoint task, to show how domain-informed reasoning can reveal generalization weaknesses that pure metrics miss. The authors argue that such knowledge-centric evaluation can better drive the development of time-series foundational models and improve cross-domain applicability, while highlighting challenges in assessing the quality and scalability of explanations. If adopted, this framework could reduce development costs, promote data and knowledge sharing with domain experts, and align model evaluation more closely with real-world scientific understanding.

Abstract

Time series data is one of the most ubiquitous data modalities existing in a diverse critical domains such as healthcare, seismology, manufacturing and energy. Recent years, there are increasing interest of the data mining community to develop time series deep learning models to pursue better performance. The models performance often evaluate by certain evaluation metrics such as RMSE, Accuracy, and F1-score. Yet time series data are often hard to interpret and are collected with unknown environmental factors, sensor configuration, latent physic mechanisms, and non-stationary evolving behavior. As a result, a model that is better on standard metric-based evaluation may not always perform better in real-world tasks. In this blue sky paper, we aim to explore the challenge that exists in the metric-based evaluation framework for time series data mining and propose a potential blue-sky idea -- developing a knowledge-discovery-based evaluation framework, which aims to effectively utilize domain-expertise knowledge to evaluate a model. We demonstrate that an evidence-seeking explanation can potentially have stronger persuasive power than metric-based evaluation and obtain better generalization ability for time series data mining tasks.

Evaluating Time Series Models with Knowledge Discovery

TL;DR

The paper addresses the gap where metric-based evaluation of time series models fails to guarantee real-world generalization due to latent factors and evolving environmental conditions, proposing a knowledge-discovery-based evaluation framework that leverages domain knowledge and evidence-seeking explanations. It introduces the distinction between explanation-seeking and evidence-seeking approaches and uses illustrative examples, such as the gravity-like relation and the GunPoint task, to show how domain-informed reasoning can reveal generalization weaknesses that pure metrics miss. The authors argue that such knowledge-centric evaluation can better drive the development of time-series foundational models and improve cross-domain applicability, while highlighting challenges in assessing the quality and scalability of explanations. If adopted, this framework could reduce development costs, promote data and knowledge sharing with domain experts, and align model evaluation more closely with real-world scientific understanding.

Abstract

Time series data is one of the most ubiquitous data modalities existing in a diverse critical domains such as healthcare, seismology, manufacturing and energy. Recent years, there are increasing interest of the data mining community to develop time series deep learning models to pursue better performance. The models performance often evaluate by certain evaluation metrics such as RMSE, Accuracy, and F1-score. Yet time series data are often hard to interpret and are collected with unknown environmental factors, sensor configuration, latent physic mechanisms, and non-stationary evolving behavior. As a result, a model that is better on standard metric-based evaluation may not always perform better in real-world tasks. In this blue sky paper, we aim to explore the challenge that exists in the metric-based evaluation framework for time series data mining and propose a potential blue-sky idea -- developing a knowledge-discovery-based evaluation framework, which aims to effectively utilize domain-expertise knowledge to evaluate a model. We demonstrate that an evidence-seeking explanation can potentially have stronger persuasive power than metric-based evaluation and obtain better generalization ability for time series data mining tasks.

Paper Structure

This paper contains 5 sections, 1 figure.

Figures (1)

  • Figure 1: Model 1 has higher accuracy but depending on the operator's height. Model 2 has lower accuracy but show coherency with the true mechanism.