Variable Selection for Comparing High-dimensional Time-Series Data

Kensuke Mitsuzawa; Margherita Grossi; Stefano Bortoli; Motonobu Kanagawa

Variable Selection for Comparing High-dimensional Time-Series Data

Kensuke Mitsuzawa, Margherita Grossi, Stefano Bortoli, Motonobu Kanagawa

TL;DR

We address the problem of identifying when and where two high-dimensional time-series differ by jointly selecting time subintervals $[t_{b-1}+1,t_b]$ and active variables $d$ within each block. The proposed Time-Slicing Variable Selection framework is a meta-algorithm that splits the total interval into $B$ subintervals, performs subinterval-wise two-sample variable selection on training portions to yield $\hat{S}_b$, and uses a permutation test on held-out data to produce $p_b$, enabling interpretable difference localization from a single pair of series. The framework is agnostic to the choice of two-sample variable-selection method, and the paper demonstrates both MMD-based (eg, ARD-kernel with $L_1$ regularization) and marginal-distribution approaches, plus synthetic and real-data demonstrations including a DNN emulator validation for a particle-based fluid simulator and a microscopic traffic-simulation comparison. The results illustrate practical trade-offs in selecting the number of subintervals and show that the approach can provide actionable, region-specific diagnostics for simulator validation and model comparison, without requiring multiple realizations.

Abstract

Given a pair of multivariate time-series data of the same length and dimensions, an approach is proposed to select variables and time intervals where the two series are significantly different. In applications where one time series is an output from a computationally expensive simulator, the approach may be used for validating the simulator against real data, for comparing the outputs of two simulators, and for validating a machine learning-based emulator against the simulator. With the proposed approach, the entire time interval is split into multiple subintervals, and on each subinterval, the two sample sets are compared to select variables that distinguish their distributions and a two-sample test is performed. The validity and limitations of the proposed approach are investigated in synthetic data experiments. Its usefulness is demonstrated in an application with a particle-based fluid simulator, where a deep neural network model is compared against the simulator, and in an application with a microscopic traffic simulator, where the effects of changing the simulator's parameters on traffic flows are analysed.

Variable Selection for Comparing High-dimensional Time-Series Data

TL;DR

We address the problem of identifying when and where two high-dimensional time-series differ by jointly selecting time subintervals

and active variables

within each block. The proposed Time-Slicing Variable Selection framework is a meta-algorithm that splits the total interval into

subintervals, performs subinterval-wise two-sample variable selection on training portions to yield

, and uses a permutation test on held-out data to produce

, enabling interpretable difference localization from a single pair of series. The framework is agnostic to the choice of two-sample variable-selection method, and the paper demonstrates both MMD-based (eg, ARD-kernel with

regularization) and marginal-distribution approaches, plus synthetic and real-data demonstrations including a DNN emulator validation for a particle-based fluid simulator and a microscopic traffic-simulation comparison. The results illustrate practical trade-offs in selecting the number of subintervals and show that the approach can provide actionable, region-specific diagnostics for simulator validation and model comparison, without requiring multiple realizations.

Variable Selection for Comparing High-dimensional Time-Series Data

TL;DR

Abstract

Variable Selection for Comparing High-dimensional Time-Series Data

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (21)