Modeling group heterogeneity in spatio-temporal data via physics-informed semiparametric regression
Marco F. De Sanctis, Eleonora Arnone, Francesca Ieva, Laura M. Sangalli
TL;DR
This work addresses spatio-temporal data with grouping structures by introducing a physics-informed semiparametric mixed effects model that combines fixed covariates $Xβ$, a shared nonparametric field $f$ regularized through a space–time PDE operator $oldsymbol{L}$, and group-specific random effects with covariance $Σ_b$. Estimation proceeds via a two-step FPIRLS algorithm, with an EM step to update the random-effects covariance, and discretization via finite elements in space and cubic B-splines in time. The approach is validated through simulations showing improved recovery of the nonparametric field and competitive estimation of fixed and random effects, and is demonstrated on Lombardy $NO_2$ data where sensor heterogeneity and missing observations are handled by the model. The work provides asymptotic guarantees, a scalable estimation framework, and practical insights for incorporating physical dynamics and grouping structure into spatio-temporal analyses, with extensions to anisotropy and irregular domains.
Abstract
In this work we propose a novel approach for modeling spatio-temporal data characterized by group structures. In particular, we extend classical mixed effect regression models by introducing a space-time nonparametric component, regularized through a partial differential equation, to embed the physical dynamics of the underlying process, while random effects capture latent variability associated with the group structure present in the data. We propose a two-step procedure to estimate the fixed and random components of the model, relying on a functional version of the Iterative Reweighted Least Squares algorithm. We investigate the asymptotic properties of both fixed and random components, and we assess the performance of the proposed model through a simulation study, comparing it with state-of-the-art alternatives from the literature. The proposed methodology is finally applied to the study of hourly nitrogen dioxide concentration data in Lombardy (Italy), using random effects to account for measurement heterogeneity across monitoring stations equipped with different sensor technologies.
