Boosting prediction with data missing not at random

Yuan Bian; Grace Y. Yi; Wenqing He

Boosting prediction with data missing not at random

Yuan Bian, Grace Y. Yi, Wenqing He

TL;DR

The paper tackles boosting under MNAR missing responses by developing two loss-adjustment strategies, inverse propensity weighting and Buckley-James type adjustment, within a semiparametric framework. It constructs consistent estimators for the missing-data components and implements a functional gradient descent boosting algorithm whose convergence and consistency are proven, with key results expressed as $R(f^{(m+1)}) - R(f^*) \le \left(1 - \dfrac{1}{C^* c^*}\right)^m \left(R(f^{(0)}) - R(f^*)\right)$ and $\lim_{n\to\infty} \|\hat{f}_n^{AL} - f^*\|_{\infty}=0$. Through simulations and KLIPS data, the methods show competitive finite-sample performance under MAR and MNAR settings, and robust sensitivity checks by comparing IPW, BJ, and naive approaches. The work provides a practical approach to predictive modeling when the response is MNAR and highlights identifiability considerations as central to validity.

Abstract

Boosting has emerged as a useful machine learning technique over the past three decades, attracting increased attention. Most advancements in this area, however, have primarily focused on numerical implementation procedures, often lacking rigorous theoretical justifications. Moreover, these approaches are generally designed for datasets with fully observed data, and their validity can be compromised by the presence of missing observations. In this paper, we employ semiparametric estimation approaches to develop boosting prediction methods for data with missing responses. We explore two strategies for adjusting the loss functions to account for missingness effects. The proposed methods are implemented using a functional gradient descent algorithm, and their theoretical properties, including algorithm convergence and estimator consistency, are rigorously established. Numerical studies demonstrate that the proposed methods perform well in finite sample settings.

Boosting prediction with data missing not at random

TL;DR

Abstract

Boosting prediction with data missing not at random

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (4)