COM-BOM: Bayesian Exemplar Search for Efficiently Exploring the Accuracy-Calibration Pareto Frontier
Gaoxiang Luo, Aryan Deshwal
TL;DR
COM-BOM reframes exemplar selection in in-context learning as a multi-objective combinatorial optimization problem, jointly optimizing accuracy $f_{acc}(oldsymbol{z})$ and calibration $f_{ECE}(oldsymbol{z})$ (via $-f_{ECE}(oldsymbol{z})$ for maximization). It introduces a sample-efficient Combinatorial Bayesian Optimization algorithm using Gaussian Process surrogates with an exponentiated Hamming kernel and a hypervolume-based acquisition (NEHVI) to approximate the Pareto front with few LLM evaluations. The method is validated on MMLU-Pro tasks using Qwen3-8B and LLaMA-3.3-70B, showing that COM-BOM discovers better accuracy–calibration trade-offs than baselines, with offline search reducing inference-time costs. This work advances reliable, calibration-aware ICL by delivering Pareto-optimal exemplar sets that support safer, more trustworthy deployment of LLMs in high-stakes settings.
Abstract
Selecting an optimal set of exemplars is critical for good performance of in-context learning. However, prior exemplar search methods narrowly optimize for predictive accuracy, critically neglecting model calibration--a key determinant of trustworthiness and safe deployment. In this paper, we formulate exemplar selection as a multi-objective optimization problem, explicitly targeting both the maximization of predictive accuracy and the minimization of expected calibration error. We solve this problem with a sample-efficient Combinatorial Bayesian Optimization algorithm (COM-BOM) to find the Pareto front that optimally trades off the two objectives of accuracy and calibration. We evaluate COM-BOM on multiple tasks from unsaturated MMLU-Pro benchmark and find that COM-BOM beats or matches the baselines at jointly optimizing the two objectives, while requiring a minimal number of LLM API calls.
