What is to be gained by ensemble models in analysis of spectroscopic data?
Katarina Domijan
TL;DR
This study tackles the question of whether ensemble methods can meaningfully improve prediction from mid-infrared spectroscopy data in milk analytics, where no single model consistently dominates across tasks. It conducts an empirical evaluation using two MIR benchmarks (regression and classification) built from milk spectra, comparing a broad library of candidate models and several stacking meta-learners whose predictions are fused via cross-validated, out-of-fold training. Linear mixed models are used to assess performance across 50 random splits (regression) and 10 splits (classification), revealing that stacking ensembles—particularly with non-negative coefficient constraints on the meta-learner—consistently outperform the best individual models, with $RMSE$ reductions and $ACC$ gains (e.g., $ACC$ rising from $0.78$ to $0.81$ in classification). The findings support using diverse ensemble strategies for spectroscopic calibration tasks, while highlighting the role of model diversity and careful cross-validation to avoid bias, and noting that linear models like PLS, LASSO, and Elastic Net remain strong competitors. Overall, ensemble stacking offers a principled path to robust predictions in chemometrics and MIR spectroscopy applications.
Abstract
An empirical study was carried out to compare different implementations of ensemble models aimed at improving prediction in spectroscopic data. A wide range of candidate models were fitted to benchmark datasets from regression and classification settings. A statistical analysis using linear mixed model was carried out on prediction performance criteria resulting from model fits over random splits of the data. The results showed that the ensemble classifiers were able to consistently outperform candidate models in our application
