Bi-level Meta-Policy Control for Dynamic Uncertainty Calibration in Evidential Deep Learning
Zhen Yang, Yansong Ma, Lei Chen
TL;DR
This work tackles the challenge of static uncertainty calibration in Evidential Deep Learning (EDL) when data distributions shift. It introduces a Meta-Policy Controller (MPC), a bi-level meta-learning framework where a state-aware policy dynamically tunes the KL regularization coefficient $\lambda_t$ and class-wise Dirichlet priors $\alpha_{0,t}$ to optimize uncertainty modelling. The inner loop trains the backbone with a dynamically configured $\mathcal{L}_{\text{EDL}}$, while the outer loop uses multi-objective rewards (ACC, ECE, MUE) to update the policy via policy gradients, with a learnable Dirichlet prior improving adaptability to class distributions. The approach yields improved uncertainty calibration, prediction accuracy, and robustness to distribution shifts, including OOD and long-tailed regimes, and shows promise for real-world, high-stakes applications.
Abstract
Traditional Evidence Deep Learning (EDL) methods rely on static hyperparameter for uncertainty calibration, limiting their adaptability in dynamic data distributions, which results in poor calibration and generalization in high-risk decision-making tasks. To address this limitation, we propose the Meta-Policy Controller (MPC), a dynamic meta-learning framework that adjusts the KL divergence coefficient and Dirichlet prior strengths for optimal uncertainty modeling. Specifically, MPC employs a bi-level optimization approach: in the inner loop, model parameters are updated through a dynamically configured loss function that adapts to the current training state; in the outer loop, a policy network optimizes the KL divergence coefficient and class-specific Dirichlet prior strengths based on multi-objective rewards balancing prediction accuracy and uncertainty quality. Unlike previous methods with fixed priors, our learnable Dirichlet prior enables flexible adaptation to class distributions and training dynamics. Extensive experimental results show that MPC significantly enhances the reliability and calibration of model predictions across various tasks, improving uncertainty calibration, prediction accuracy, and performance retention after confidence-based sample rejection.
