Counterfactual Explanation for Multivariate Time Series Forecasting with Exogenous Variables
Keita Kinjo
TL;DR
Stresses interpretability in forecasting with exogenous drivers and proposes CET-X to generate counterfactual explanations by perturbing recent exogenous inputs to steer multi-step predictions toward a target trajectory. The method defines a convex-like optimization combining prediction error and a proximity penalty, with both full and partial (selected exogenous) variable intervention, and provides analytical solutions in the linear case. The paper introduces mechanisms to quantify variable importance across an entire time series and to evaluate counterfactual quality with metrics such as X-loss, Z-loss, and temporal smoothness. Validations on simulated data (linear and nonlinear) and real-world Google Trends data demonstrate accurate CE extraction, identification of influential exogenous variables, and practical utility for policy and marketing decision-making.
Abstract
Currently, machine learning is widely used across various domains, including time series data analysis. However, some machine learning models function as black boxes, making interpretability a critical concern. One approach to address this issue is counterfactual explanation (CE), which aims to provide insights into model predictions. This study focuses on the relatively underexplored problem of generating counterfactual explanations for time series forecasting. We propose a method for extracting CEs in time series forecasting using exogenous variables, which are frequently encountered in fields such as business and marketing. In addition, we present methods for analyzing the influence of each variable over an entire time series, generating CEs by altering only specific variables, and evaluating the quality of the resulting CEs. We validate the proposed method through theoretical analysis and empirical experiments, showcasing its accuracy and practical applicability. These contributions are expected to support real-world decision-making based on time series data analysis.
