Table of Contents
Fetching ...

Bayesian Active Learning for Censored Regression

Frederik Boe Hüttel, Christoffer Riis, Filipe Rodrigues, Francisco Câmara Pereira

TL;DR

This work addresses active learning for censored regression, where labels may be clipped by censorship and BALD cannot be directly applied. It derives the entropy and mutual information for censored distributions and introduces the $\mathcal{C}$-BALD objective, a censoring-aware information gain for acquisitions. To make $\mathcal{C}$-BALD practical, the authors propose an explicit modelling scheme that learns the censoring indicator $\ell_i$ and the censoring threshold $z_i$ alongside the target distributions using a single Bayesian neural network with stochastic parameters. Empirically, $\mathcal{C}$-BALD consistently outperforms standard BALD and other baselines across synthetic and real-world censored datasets, underscoring its value for informative data collection in survival analysis, healthcare, and related domains.

Abstract

Bayesian active learning is based on information theoretical approaches that focus on maximising the information that new observations provide to the model parameters. This is commonly done by maximising the Bayesian Active Learning by Disagreement (BALD) acquisitions function. However, we highlight that it is challenging to estimate BALD when the new data points are subject to censorship, where only clipped values of the targets are observed. To address this, we derive the entropy and the mutual information for censored distributions and derive the BALD objective for active learning in censored regression ($\mathcal{C}$-BALD). We propose a novel modelling approach to estimate the $\mathcal{C}$-BALD objective and use it for active learning in the censored setting. Across a wide range of datasets and models, we demonstrate that $\mathcal{C}$-BALD outperforms other Bayesian active learning methods in censored regression.

Bayesian Active Learning for Censored Regression

TL;DR

This work addresses active learning for censored regression, where labels may be clipped by censorship and BALD cannot be directly applied. It derives the entropy and mutual information for censored distributions and introduces the -BALD objective, a censoring-aware information gain for acquisitions. To make -BALD practical, the authors propose an explicit modelling scheme that learns the censoring indicator and the censoring threshold alongside the target distributions using a single Bayesian neural network with stochastic parameters. Empirically, -BALD consistently outperforms standard BALD and other baselines across synthetic and real-world censored datasets, underscoring its value for informative data collection in survival analysis, healthcare, and related domains.

Abstract

Bayesian active learning is based on information theoretical approaches that focus on maximising the information that new observations provide to the model parameters. This is commonly done by maximising the Bayesian Active Learning by Disagreement (BALD) acquisitions function. However, we highlight that it is challenging to estimate BALD when the new data points are subject to censorship, where only clipped values of the targets are observed. To address this, we derive the entropy and the mutual information for censored distributions and derive the BALD objective for active learning in censored regression (-BALD). We propose a novel modelling approach to estimate the -BALD objective and use it for active learning in the censored setting. Across a wide range of datasets and models, we demonstrate that -BALD outperforms other Bayesian active learning methods in censored regression.
Paper Structure (31 sections, 24 equations, 5 figures, 6 tables)

This paper contains 31 sections, 24 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Illustration of a 1-dimensional censored dataset, in which the dashed black line represents the underlying function that generated the data. The (blue) circles denote non-censored observations, while the (green) crosses represent observations that have been censored. The grey background indicates areas where the observations are censored.
  • Figure 2: Overview of the modelling approach. We propose to model the distributions $p_\theta(\ell_i| \mathbf{x}_i)$, $p_\theta(y_i | \mathbf{x}_i)$, and $p_\theta(y^*_i| \mathbf{x}_i)$. Shaded circles indicate observed values and unshaded circles indicate latent values.
  • Figure 3: Overview of the fit of the proposed modelling approach on the 1-D synthetic dataset. Grey areas indicate where $y^* \leq z$, and the black line is the true function we are trying to approximate. Red: Estimated distribution of the true function. Blue: Estimated distribution of the observed values. Orange: Estimated probability of being censored (scaled between 0 and 2 for illustration purposes.)
  • Figure 4: Left): The mutual information calculations for the label $y$ and the censoring status. Grey areas indicate areas with complete censoring. Most information comes from the cross-over point between the censored and non-censored values. Right): The right censored NLL for the models across different acquisition functions on the synthetic dataset (mean $\pm$ standard error). $\mathcal{C}\text{-BALD }$ achieves the best overall fit on the test set.
  • Figure 5: Results of the real-world experiments on two of the seven datasets, namely the METABRIC and CREDIT RISK datasets, respectively. The figure shows the NLL (mean $\pm$ standard error) across the multiple repetitions of the experiment.