Table of Contents
Fetching ...

Renewable estimation in linear expectile regression models with streaming data sets

Wei Cao, Shanshan Wanga, Xiaoxue Hua

TL;DR

A novel online renewable method based on expectile regression, which efficiently updates estimates using both current observations and historical summaries, thereby reducing storage requirements and achieving superior computational efficiency compared with existing online renewable methods for streaming data with heteroscedastic variances or inhomogeneous covariate effects.

Abstract

Streaming data often exhibit heterogeneity due to heteroscedastic variances or inhomogeneous covariate effects. Online renewable quantile and expectile regression methods provide valuable tools for detecting such heteroscedasticity by combining current data with summary statistics from historical data. However, quantile regression can be computationally demanding because of the non-smooth check function. To address this, we propose a novel online renewable method based on expectile regression, which efficiently updates estimates using both current observations and historical summaries, thereby reducing storage requirements. By exploiting the smoothness of the expectile loss function, our approach achieves superior computational efficiency compared with existing online renewable methods for streaming data with heteroscedastic variances or inhomogeneous covariate effects. We establish the consistency and asymptotic normality of the proposed estimator under mild regularity conditions, demonstrating that it achieves the same statistical efficiency as oracle estimators based on full individual-level data. Numerical experiments and real-data applications demonstrate that our method performs comparably to the oracle estimator while maintaining high computational efficiency and minimal storage costs.

Renewable estimation in linear expectile regression models with streaming data sets

TL;DR

A novel online renewable method based on expectile regression, which efficiently updates estimates using both current observations and historical summaries, thereby reducing storage requirements and achieving superior computational efficiency compared with existing online renewable methods for streaming data with heteroscedastic variances or inhomogeneous covariate effects.

Abstract

Streaming data often exhibit heterogeneity due to heteroscedastic variances or inhomogeneous covariate effects. Online renewable quantile and expectile regression methods provide valuable tools for detecting such heteroscedasticity by combining current data with summary statistics from historical data. However, quantile regression can be computationally demanding because of the non-smooth check function. To address this, we propose a novel online renewable method based on expectile regression, which efficiently updates estimates using both current observations and historical summaries, thereby reducing storage requirements. By exploiting the smoothness of the expectile loss function, our approach achieves superior computational efficiency compared with existing online renewable methods for streaming data with heteroscedastic variances or inhomogeneous covariate effects. We establish the consistency and asymptotic normality of the proposed estimator under mild regularity conditions, demonstrating that it achieves the same statistical efficiency as oracle estimators based on full individual-level data. Numerical experiments and real-data applications demonstrate that our method performs comparably to the oracle estimator while maintaining high computational efficiency and minimal storage costs.
Paper Structure (23 sections, 4 theorems, 50 equations, 12 figures, 7 tables, 1 algorithm)

This paper contains 23 sections, 4 theorems, 50 equations, 12 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

(Consistency of $\tilde{\boldsymbol{\beta}}_b$). Assume that conditions (C1)-(C5) hold, the renewable estimator $\tilde{\boldsymbol{\beta}}_b$ is consistent with the true parameter, that is:

Figures (12)

  • Figure 1: Illustration of the proposed online renewable ReER.
  • Figure 2: MSE values for fixed $N_k$ with varying $n_k$ at the $25\%$ expectile level.
  • Figure 3: MSE values for fixed $n_k$ with varying $K$ at the $25\%$ expectile level.
  • Figure 4: The computational time of renewable method and the Oracle method with fixed $n_k$ and varying $K$ in 25% expectile level.
  • Figure 5: The estimated coefficients under different expectile level for air quality data.
  • ...and 7 more figures

Theorems & Definitions (5)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Remark 1
  • Lemma 1