Bayesian Active Learning for Censored Regression
Frederik Boe Hüttel, Christoffer Riis, Filipe Rodrigues, Francisco Câmara Pereira
TL;DR
This work addresses active learning for censored regression, where labels may be clipped by censorship and BALD cannot be directly applied. It derives the entropy and mutual information for censored distributions and introduces the $\mathcal{C}$-BALD objective, a censoring-aware information gain for acquisitions. To make $\mathcal{C}$-BALD practical, the authors propose an explicit modelling scheme that learns the censoring indicator $\ell_i$ and the censoring threshold $z_i$ alongside the target distributions using a single Bayesian neural network with stochastic parameters. Empirically, $\mathcal{C}$-BALD consistently outperforms standard BALD and other baselines across synthetic and real-world censored datasets, underscoring its value for informative data collection in survival analysis, healthcare, and related domains.
Abstract
Bayesian active learning is based on information theoretical approaches that focus on maximising the information that new observations provide to the model parameters. This is commonly done by maximising the Bayesian Active Learning by Disagreement (BALD) acquisitions function. However, we highlight that it is challenging to estimate BALD when the new data points are subject to censorship, where only clipped values of the targets are observed. To address this, we derive the entropy and the mutual information for censored distributions and derive the BALD objective for active learning in censored regression ($\mathcal{C}$-BALD). We propose a novel modelling approach to estimate the $\mathcal{C}$-BALD objective and use it for active learning in the censored setting. Across a wide range of datasets and models, we demonstrate that $\mathcal{C}$-BALD outperforms other Bayesian active learning methods in censored regression.
