Table of Contents
Fetching ...

HAPEns: Hardware-Aware Post-Hoc Ensembling for Tabular Data

Jannis Maier, Lennart Purucker

TL;DR

HAPEns is introduced, a post-hoc ensembling method that explicitly balances accuracy against hardware efficiency, and is Inspired by multi-objective and quality diversity optimization, which constructs a diverse set of ensembles along the Pareto front of predictive performance and resource usage.

Abstract

Ensembling is commonly used in machine learning on tabular data to boost predictive performance and robustness, but larger ensembles often lead to increased hardware demand. We introduce HAPEns, a post-hoc ensembling method that explicitly balances accuracy against hardware efficiency. Inspired by multi-objective and quality diversity optimization, HAPEns constructs a diverse set of ensembles along the Pareto front of predictive performance and resource usage. Existing hardware-aware post-hoc ensembling baselines are not available, highlighting the novelty of our approach. Experiments on 83 tabular classification datasets show that HAPEns significantly outperforms baselines, finding superior trade-offs for ensemble performance and deployment cost. Ablation studies also reveal that memory usage is a particularly effective objective metric. Further, we show that even a greedy ensembling algorithm can be significantly improved in this task with a static multi-objective weighting scheme.

HAPEns: Hardware-Aware Post-Hoc Ensembling for Tabular Data

TL;DR

HAPEns is introduced, a post-hoc ensembling method that explicitly balances accuracy against hardware efficiency, and is Inspired by multi-objective and quality diversity optimization, which constructs a diverse set of ensembles along the Pareto front of predictive performance and resource usage.

Abstract

Ensembling is commonly used in machine learning on tabular data to boost predictive performance and robustness, but larger ensembles often lead to increased hardware demand. We introduce HAPEns, a post-hoc ensembling method that explicitly balances accuracy against hardware efficiency. Inspired by multi-objective and quality diversity optimization, HAPEns constructs a diverse set of ensembles along the Pareto front of predictive performance and resource usage. Existing hardware-aware post-hoc ensembling baselines are not available, highlighting the novelty of our approach. Experiments on 83 tabular classification datasets show that HAPEns significantly outperforms baselines, finding superior trade-offs for ensemble performance and deployment cost. Ablation studies also reveal that memory usage is a particularly effective objective metric. Further, we show that even a greedy ensembling algorithm can be significantly improved in this task with a static multi-objective weighting scheme.
Paper Structure (22 sections, 1 equation, 15 figures, 1 table, 2 algorithms)

This paper contains 22 sections, 1 equation, 15 figures, 1 table, 2 algorithms.

Figures (15)

  • Figure 1: Illustration of three ensemble selection strategies: a standard method ignoring hardware constraints, a naive hardware-aware variant that sacrifices accuracy, and an advanced hardware-aware method that balances accuracy and efficiency. Box size reflects model resource usage; the red dashed line indicates the hardware resource constraint.
  • Figure 2: Overview of the main research areas. HW-NAS (red) is shown as a parallel area, while the others (orange) directly influence HAPEns. This work focuses solely on tabular data (blue).
  • Figure 3: Illustration of the HAPEns search process. Ensembles are sampled from bins over memory footprint and average loss correlation, then evolved via crossover and mutation to explore the behavior space.
  • Figure 4: Scatter plot of datasets over their number of features (y), number of samples (x), and the number of classes (color).
  • Figure 5: Comparison of TabRepos model types and their corresponding inference times for varying tasks. KNeighbours and linear regression are expectedly on the lower end of the spectrum, while transformers have increased cost due to their complexity.
  • ...and 10 more figures