Table of Contents
Fetching ...

Knowledge-based model validation using a custom metric

Nicola Henkelmann, Stephan Rhode, Johannes von Keler

TL;DR

The paper tackles the problem that standard validation metrics often fail to answer whether a model is 'accurately enough' for a specific use case. It introduces a knowledge-based approach that collects expert face-validation ratings and learns a custom time-series similarity metric by regressing a set of candidate features $f_i(\mathbf{x}, \mathbf{y})$ onto expert ratings via $R = \sum_i w_i f_i(\mathbf{x}, \mathbf{y}) + \epsilon$, with feature selection and uncertainty quantified through prediction intervals. The method is demonstrated on artificial data and real datasets, including literature-derived radial velocities and a rack-position steering scenario, showing that the custom metric can align with expert judgments where traditional metrics fall short, while also exposing limits when labeled data are scarce. The work emphasizes that metric design should be grounded in domain knowledge and accompanied by uncertainty quantification to provide reliable validation guidance for cyber-physical systems. In practical terms, the proposed approach offers a principled path to derive use-case-specific validation criteria and supports more trustworthy model validation in automotive and related engineering domains, guided by expert consensus and transparent uncertainty bounds. The key contributions include formalizing expert-guided metric design, embedding it in a regression framework with robust options (e.g., LASSO), and demonstrating the necessity of prediction intervals for reliable validation decisions.

Abstract

Vehicle models have a long history of research and as of today are able to model the involved physics in a reasonable manner. However, each new vehicle has its new characteristics or parameters. The identification of these is the main task of an engineer. To validate whether the correct parameter set has been chosen is a tedious task and often can only be performed by experts. Metrics known commonly used in literature are able to compare different results under certain aspects. However, they fail to answer the question: Are the models accurate enough? In this article, we propose the usage of a custom metric trained on the knowledge of experts to tackle this problem. Our approach involves three main steps: first, the formalized collection of subject matter experts' opinion on the question: Having seen the measurement and simulation time series in comparison, is the model quality sufficient? From this step, we obtain a data set that is able to quantify the sufficiency of a simulation result based on a comparison to corresponding experimental data. In a second step, we compute common model metrics on the measurement and simulation time series and use these model metrics as features to a regression model. Third, we fit a regression model to the experts' opinions. This regression model, i.e., our custom metric, can than predict the sufficiency of a new simulation result and gives a confidence on this prediction.

Knowledge-based model validation using a custom metric

TL;DR

The paper tackles the problem that standard validation metrics often fail to answer whether a model is 'accurately enough' for a specific use case. It introduces a knowledge-based approach that collects expert face-validation ratings and learns a custom time-series similarity metric by regressing a set of candidate features onto expert ratings via , with feature selection and uncertainty quantified through prediction intervals. The method is demonstrated on artificial data and real datasets, including literature-derived radial velocities and a rack-position steering scenario, showing that the custom metric can align with expert judgments where traditional metrics fall short, while also exposing limits when labeled data are scarce. The work emphasizes that metric design should be grounded in domain knowledge and accompanied by uncertainty quantification to provide reliable validation guidance for cyber-physical systems. In practical terms, the proposed approach offers a principled path to derive use-case-specific validation criteria and supports more trustworthy model validation in automotive and related engineering domains, guided by expert consensus and transparent uncertainty bounds. The key contributions include formalizing expert-guided metric design, embedding it in a regression framework with robust options (e.g., LASSO), and demonstrating the necessity of prediction intervals for reliable validation decisions.

Abstract

Vehicle models have a long history of research and as of today are able to model the involved physics in a reasonable manner. However, each new vehicle has its new characteristics or parameters. The identification of these is the main task of an engineer. To validate whether the correct parameter set has been chosen is a tedious task and often can only be performed by experts. Metrics known commonly used in literature are able to compare different results under certain aspects. However, they fail to answer the question: Are the models accurate enough? In this article, we propose the usage of a custom metric trained on the knowledge of experts to tackle this problem. Our approach involves three main steps: first, the formalized collection of subject matter experts' opinion on the question: Having seen the measurement and simulation time series in comparison, is the model quality sufficient? From this step, we obtain a data set that is able to quantify the sufficiency of a simulation result based on a comparison to corresponding experimental data. In a second step, we compute common model metrics on the measurement and simulation time series and use these model metrics as features to a regression model. Third, we fit a regression model to the experts' opinions. This regression model, i.e., our custom metric, can than predict the sufficiency of a new simulation result and gives a confidence on this prediction.

Paper Structure

This paper contains 13 sections, 27 equations, 16 figures, 2 tables.

Figures (16)

  • Figure 1: Comparison of sine signals with same RMS-error value
  • Figure 2: Colorbar for expert rating.
  • Figure 3: Interactive rating document. The graph shows timeseries data of simulation and measurement next to the rating bar.
  • Figure 4: Steps to design a custom metric.
  • Figure 5: Experimental and simulation data
  • ...and 11 more figures