When Less is More: A Story of Failing Bayesian Optimization Due to Additional Expert Knowledge
Dorina Weichert, Gunar Ernis, Marvin Worthmann, Peter Ryzko, Lukas Seifert
TL;DR
The paper investigates optimizing recycled-plastic compound formulations with Bayesian Optimization under multiple constraints, revealing that adding expert-derived features can inadvertently increase dimensionality and hinder optimization. Through a sequence of experiments—from vanilla constrained BO and constraint relaxation to problem reformulation and a simplified-space BO—the authors demonstrate that a reduced four-parameter input space, coupled with a data-driven oracle based on historical experiments, can achieve performance comparable to or better than expert designs while minimizing distance to target properties. Key contributions include a detailed failure analysis of expert-informed BO, a practical simple BO approach, and guidelines for when to incorporate domain knowledge in industrial BO settings. The findings highlight the importance of problem formulation, feature selection, and adaptive constraint handling for real-world, resource-constrained materials design, with implications for accelerating experimental design in recycled-plastic development.
Abstract
The compounding of plastics with recycled material remains a practical challenge, as the properties of the processed material is not as easy to control as with completely new raw materials. For a data scientist, it makes sense to plan the necessary experiments in the development of new compounds using Bayesian Optimization, an optimization approach based on a surrogate model that is known for its data efficiency and is therefore well suited for data obtained from costly experiments. Furthermore, if historical data and expert knowledge are available, their inclusion in the surrogate model is expected to accelerate the convergence of the optimization. In this article, we describe a use case in which the addition of data and knowledge has impaired optimization. We also describe the unsuccessful methods that were used to remedy the problem before we found the reasons for the poor performance and achieved a satisfactory result. We conclude with a lesson learned: additional knowledge and data are only beneficial if they do not complicate the underlying optimization goal.
