Uncertainty-driven Embedding Convolution
Sungjun Lim, Kangjun Noh, Youngjun Choi, Heeyoung Lee, Kyungwoo Song
TL;DR
Uncertainty-driven Embedding Convolution (UEC) tackles the lack of a universally dominant embedding by forming a principled, uncertainty-aware ensemble. It post-hoc converts deterministic embeddings into Gaussian representations via Laplace approximation, then combines them with query-specific coefficients that down-weight uncertain models, and finally scores similarity using an uncertainty-aware, near-distributional distance surrogate. Across multilingual benchmarks (MIRACL/MMTEB), UEC consistently improves retrieval, classification, and semantic similarity while providing well-calibrated uncertainty estimates and maintaining near-linear computational complexity. The approach delivers robust, adaptable embedding ensembles suitable for real-time, cross-domain NLP tasks and highlights future directions in extending uncertainty modeling to aleatoric, multimodal, and fairness-aware contexts.
Abstract
Text embeddings are essential components in modern NLP pipelines. Although numerous embedding models have been proposed, no single model consistently dominates across domains and tasks. This variability motivates the use of ensemble techniques to combine complementary strengths. However, most existing ensemble methods operate on deterministic embeddings and fail to account for model-specific uncertainty, limiting their robustness and reliability in downstream applications. To address these limitations, we propose Uncertainty-driven Embedding Convolution (UEC). UEC first transforms deterministic embeddings into probabilistic ones in a post-hoc manner. It then computes adaptive ensemble coefficients based on embedding uncertainty, derived from a principled surrogate-loss formulation. Additionally, UEC employs an uncertainty-aware similarity function that directly incorporates uncertainty into the similarity scoring, providing a theoretically grounded and efficient surrogate to distributional distances. Extensive experiments on diverse benchmarks demonstrate that UEC consistently improves both performance and robustness by leveraging principled uncertainty modeling.
