Table of Contents
Fetching ...

Robust Functional Regression with Discretely Sampled Predictors

Ioannis Kalogridis, Stanislav Nagy

TL;DR

This work develops a robust functional regression framework for discretely sampled predictors by leveraging penalized thin-plate splines in Sobolev spaces. The estimators solve a convex M-estimation problem with a novel J_m^2 penalty and an adaptable loss $\rho$, enabling robustness against outliers and model misspecification while accommodating multi-dimensional domains and random smoothing parameters. The paper derives asymptotic rates that reveal a fundamental trade-off between sample size and discretization granularity, and provides practical IRLS-based implementations with robust scale-based smoothing parameter selection. Finite-sample studies and a real data application on ozone concentration demonstrate the method’s resilience to anomalies and incomplete data, as well as its usefulness for detecting influential observations. Overall, the approach offers a flexible, scalable, and robust toolkit for functional regression in discretely observed, high-dimensional settings.

Abstract

The functional linear model is an important extension of the classical regression model allowing for scalar responses to be modeled as functions of stochastic processes. Yet, despite the usefulness and popularity of the functional linear model in recent years, most treatments, theoretical and practical alike, suffer either from (i) lack of resistance towards the many types of anomalies one may encounter with functional data or (ii) biases resulting from the use of discretely sampled functional data instead of completely observed data. To address these deficiencies, this paper introduces and studies the first class of robust functional regression estimators for partially observed functional data. The proposed broad class of estimators is based on thin-plate splines with a novel computationally efficient quadratic penalty, is easily implementable and enjoys good theoretical properties under weak assumptions. We show that, in the incomplete data setting, both the sample size and discretization error of the processes determine the asymptotic rate of convergence of functional regression estimators and the latter cannot be ignored. These theoretical properties remain valid even with multi-dimensional random fields acting as predictors and random smoothing parameters. The effectiveness of the proposed class of estimators in practice is demonstrated by means of a simulation study and a real-data example.

Robust Functional Regression with Discretely Sampled Predictors

TL;DR

This work develops a robust functional regression framework for discretely sampled predictors by leveraging penalized thin-plate splines in Sobolev spaces. The estimators solve a convex M-estimation problem with a novel J_m^2 penalty and an adaptable loss , enabling robustness against outliers and model misspecification while accommodating multi-dimensional domains and random smoothing parameters. The paper derives asymptotic rates that reveal a fundamental trade-off between sample size and discretization granularity, and provides practical IRLS-based implementations with robust scale-based smoothing parameter selection. Finite-sample studies and a real data application on ozone concentration demonstrate the method’s resilience to anomalies and incomplete data, as well as its usefulness for detecting influential observations. Overall, the approach offers a flexible, scalable, and robust toolkit for functional regression in discretely observed, high-dimensional settings.

Abstract

The functional linear model is an important extension of the classical regression model allowing for scalar responses to be modeled as functions of stochastic processes. Yet, despite the usefulness and popularity of the functional linear model in recent years, most treatments, theoretical and practical alike, suffer either from (i) lack of resistance towards the many types of anomalies one may encounter with functional data or (ii) biases resulting from the use of discretely sampled functional data instead of completely observed data. To address these deficiencies, this paper introduces and studies the first class of robust functional regression estimators for partially observed functional data. The proposed broad class of estimators is based on thin-plate splines with a novel computationally efficient quadratic penalty, is easily implementable and enjoys good theoretical properties under weak assumptions. We show that, in the incomplete data setting, both the sample size and discretization error of the processes determine the asymptotic rate of convergence of functional regression estimators and the latter cannot be ignored. These theoretical properties remain valid even with multi-dimensional random fields acting as predictors and random smoothing parameters. The effectiveness of the proposed class of estimators in practice is demonstrated by means of a simulation study and a real-data example.
Paper Structure (22 sections, 11 theorems, 180 equations, 4 figures, 2 tables)

This paper contains 22 sections, 11 theorems, 180 equations, 4 figures, 2 tables.

Key Result

Proposition 1

Suppose that $\rho$ is a convex loss function, $2m>d$ and A1 and A2 hold. Then, there exists a solution to eq:Est in $\mathbbm{R} \times \mathcal{H}^{m}(\mathbbm{R}^d)$ denoted by $(\widehat{\alpha}_n, \widehat{\beta}_n)$. Moreover, if the set $\{\mathbf{t}_j\}_{j=1}^p$ contains a $\mathcal{P}_{m}$-

Figures (4)

  • Figure 1: $100$ curves generated with standard Gaussian and $t_2$-distributed $\{W_{ij}\}_{i, j=1}^{100,50}$ on the left and right panels, respectively.
  • Figure 2: 1000 Least-squares (left) and Huber (right) estimates for Model 1 (top panels) and Model 2 (bottom panels) under $\mathop{\mathrm{NSR}}\nolimits=0.1$ and $p=100$. The lines (, ) depict the true coefficient function $\beta_0$ and the first $5$ estimated functions.
  • Figure 3: Contours of the least-squares and Huber thin-plate spline coefficient function estimates on the convex hull of the data on the left and right panels, respectively. Lighter colors correspond to larger values of the estimates. The symbols (, , ) respectively indicate the positions of the monitoring stations, the position of our reference station and the outlying observations detected by the Huber estimates.
  • Figure 4: Contours of the least-squares and Huber thin-plate spline coefficient function estimates on the convex hull of the data on the left and right panels, respectively, after the outliers have been removed. Lighter colors correspond to larger values of the estimates. The symbols (, ) respectively indicate the positions of the monitoring stations and the position of our reference station.

Theorems & Definitions (14)

  • Proposition 1
  • Corollary 1
  • Proposition 2
  • Theorem 1
  • Corollary 2
  • Theorem 2
  • Theorem 3
  • Lemma 1: Lemma 8.5 in vandeGeer:2000
  • Lemma 2
  • proof
  • ...and 4 more