Table of Contents
Fetching ...

M-DEW: Extending Dynamic Ensemble Weighting to Handle Missing Values

Adam Catto, Nan Jia, Ansaf Salleb-Aouissi, Anita Raja

TL;DR

M-DEW extends dynamic ensemble weighting to handle missing data by forming and optimizing two-stage imputation-prediction pipelines. It trains a pool of eight pipelines (four imputers × two classifiers) and, at inference, assigns per-sample weights based on local competence in a neighborhood of the training data, yielding per-sample calibrated predictions with lower perplexity. The approach achieves statistically significant reductions in sample-wise prediction errors in 17 of 18 experiments and improves average precision in 13 of 18 datasets, outperforming uniform model averaging while maintaining low computational overhead. This method enables better uncertainty quantification and calibration in downstream tasks involving missing values, with potential for AutoML integration and future joint optimization of imputation and prediction models.

Abstract

Missing value imputation is a crucial preprocessing step for many machine learning problems. However, it is often considered as a separate subtask from downstream applications such as classification, regression, or clustering, and thus is not optimized together with them. We hypothesize that treating the imputation model and downstream task model together and optimizing over full pipelines will yield better results than treating them separately. Our work describes a novel AutoML technique for making downstream predictions with missing data that automatically handles preprocessing, model weighting, and selection during inference time, with minimal compute overhead. Specifically we develop M-DEW, a Dynamic missingness-aware Ensemble Weighting (DEW) approach, that constructs a set of two-stage imputation-prediction pipelines, trains each component separately, and dynamically calculates a set of pipeline weights for each sample during inference time. We thus extend previous work on dynamic ensemble weighting to handle missing data at the level of full imputation-prediction pipelines, improving performance and calibration on downstream machine learning tasks over standard model averaging techniques. M-DEW is shown to outperform the state-of-the-art in that it produces statistically significant reductions in model perplexity in 17 out of 18 experiments, while improving average precision in 13 out of 18 experiments.

M-DEW: Extending Dynamic Ensemble Weighting to Handle Missing Values

TL;DR

M-DEW extends dynamic ensemble weighting to handle missing data by forming and optimizing two-stage imputation-prediction pipelines. It trains a pool of eight pipelines (four imputers × two classifiers) and, at inference, assigns per-sample weights based on local competence in a neighborhood of the training data, yielding per-sample calibrated predictions with lower perplexity. The approach achieves statistically significant reductions in sample-wise prediction errors in 17 of 18 experiments and improves average precision in 13 of 18 datasets, outperforming uniform model averaging while maintaining low computational overhead. This method enables better uncertainty quantification and calibration in downstream tasks involving missing values, with potential for AutoML integration and future joint optimization of imputation and prediction models.

Abstract

Missing value imputation is a crucial preprocessing step for many machine learning problems. However, it is often considered as a separate subtask from downstream applications such as classification, regression, or clustering, and thus is not optimized together with them. We hypothesize that treating the imputation model and downstream task model together and optimizing over full pipelines will yield better results than treating them separately. Our work describes a novel AutoML technique for making downstream predictions with missing data that automatically handles preprocessing, model weighting, and selection during inference time, with minimal compute overhead. Specifically we develop M-DEW, a Dynamic missingness-aware Ensemble Weighting (DEW) approach, that constructs a set of two-stage imputation-prediction pipelines, trains each component separately, and dynamically calculates a set of pipeline weights for each sample during inference time. We thus extend previous work on dynamic ensemble weighting to handle missing data at the level of full imputation-prediction pipelines, improving performance and calibration on downstream machine learning tasks over standard model averaging techniques. M-DEW is shown to outperform the state-of-the-art in that it produces statistically significant reductions in model perplexity in 17 out of 18 experiments, while improving average precision in 13 out of 18 experiments.
Paper Structure (20 sections, 4 equations, 7 figures, 5 tables, 2 algorithms)

This paper contains 20 sections, 4 equations, 7 figures, 5 tables, 2 algorithms.

Figures (7)

  • Figure 1: M-DEW Work Flow Diagram: Phase 1: Imputation and prediction models are fitted to "stage-1" training set. Phase 2: Inference with each imputation-prediction pipeline run on a "stage-2" training set, with pipeline errors stored for each sample. Phase 3: Inference on new samples involves weighting each pipeline's prediction according to its relative competence in the neighborhood of the input sample, i.e. the softmax over pipelines' mean inverse errors.
  • Figure 2: Violin Plot of AUROC for 8 standard imputer-estimators, baseline UMA pipeline and M-DEW pipeline's performance rankings. All metrics in M-DEW ranked at the first place with narrower range. The wider the shape got in the plots, the more samples filled in
  • Figure 3: Frequency of rankings that M-DEW has most 1st place ranking among other models
  • Figure 4: Relative Error Histograms-EEG Eye State
  • Figure 5: Relative Error Histograms- Myocardial Infarction
  • ...and 2 more figures