Table of Contents
Fetching ...

Robust Joint Modeling for Data with Continuous and Binary Responses

Yu Wang, Ran Jin, Lulu Kang

Abstract

In many supervised learning applications, the response consists of both continuous and binary outcomes. Studies have shown that jointly modeling such mixed-type responses can substantially improve predictive performance compared to separate analyses. But outliers pose a new challenge to the existing likelihood-based modeling approaches. In this paper, we propose a new robust joint modeling framework for data with both continuous and binary responses. It is based on the density power divergence (DPD) loss function with the $\ell_1$ regularization. The proposed framework leads to a sparse estimator that simultaneously predicts continuous and binary responses in high-dimensional input settings while down-weighting influential outliers and mislabeled samples. We also develop an efficient proximal gradient algorithm with Barzilai-Borwein spectral step size and a robust information criterion (RIC) for data-driven selection of the penalty parameters. Extensive simulation studies under a variety of contamination schemes demonstrate that the proposed method achieves lower prediction error and more accurate parameter estimation than several competing approaches. A real case study on wafer lapping in semiconductor manufacturing further illustrates the practical gains in predictive accuracy, robustness, and interpretability of the proposed framework.

Robust Joint Modeling for Data with Continuous and Binary Responses

Abstract

In many supervised learning applications, the response consists of both continuous and binary outcomes. Studies have shown that jointly modeling such mixed-type responses can substantially improve predictive performance compared to separate analyses. But outliers pose a new challenge to the existing likelihood-based modeling approaches. In this paper, we propose a new robust joint modeling framework for data with both continuous and binary responses. It is based on the density power divergence (DPD) loss function with the regularization. The proposed framework leads to a sparse estimator that simultaneously predicts continuous and binary responses in high-dimensional input settings while down-weighting influential outliers and mislabeled samples. We also develop an efficient proximal gradient algorithm with Barzilai-Borwein spectral step size and a robust information criterion (RIC) for data-driven selection of the penalty parameters. Extensive simulation studies under a variety of contamination schemes demonstrate that the proposed method achieves lower prediction error and more accurate parameter estimation than several competing approaches. A real case study on wafer lapping in semiconductor manufacturing further illustrates the practical gains in predictive accuracy, robustness, and interpretability of the proposed framework.
Paper Structure (22 sections, 1 theorem, 70 equations, 8 figures, 3 tables, 2 algorithms)

This paper contains 22 sections, 1 theorem, 70 equations, 8 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

Let $\hat{\bm \theta}_n=(\hat{\bm \beta}_n, \hat{\bm \omega}_n, \hat{\bm \eta}_n)$ be the minimizer of $Q_{\alpha}(\bm \theta, \sigma^2)$ for given $\sigma^2$. Under the Assumptions A.1-A.5 stated in sec:assump, with probability equal to 1 and as $n\to \infty$, there exists $(\hat{\bm \beta}_n, \hat

Figures (8)

  • Figure 1: Predicted TTV v.s. observations using three methods for the wafer lapping data.
  • Figure 2: Boxplots of RMSPE from $B=100$ simulations with varying contamination type.
  • Figure 3: Boxplots of MEs from $B=100$ simulations with varying contamination type.
  • Figure 4: Boxplots of $\ell_2$ error for coefficient from $B=100$ simulations with varying contamination type.
  • Figure 5: Boxplots of RMSPE from $B=100$ simulations with varying sparsity and contamination levels.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Theorem 1