Table of Contents
Fetching ...

Bi-level Meta-Policy Control for Dynamic Uncertainty Calibration in Evidential Deep Learning

Zhen Yang, Yansong Ma, Lei Chen

TL;DR

This work tackles the challenge of static uncertainty calibration in Evidential Deep Learning (EDL) when data distributions shift. It introduces a Meta-Policy Controller (MPC), a bi-level meta-learning framework where a state-aware policy dynamically tunes the KL regularization coefficient $\lambda_t$ and class-wise Dirichlet priors $\alpha_{0,t}$ to optimize uncertainty modelling. The inner loop trains the backbone with a dynamically configured $\mathcal{L}_{\text{EDL}}$, while the outer loop uses multi-objective rewards (ACC, ECE, MUE) to update the policy via policy gradients, with a learnable Dirichlet prior improving adaptability to class distributions. The approach yields improved uncertainty calibration, prediction accuracy, and robustness to distribution shifts, including OOD and long-tailed regimes, and shows promise for real-world, high-stakes applications.

Abstract

Traditional Evidence Deep Learning (EDL) methods rely on static hyperparameter for uncertainty calibration, limiting their adaptability in dynamic data distributions, which results in poor calibration and generalization in high-risk decision-making tasks. To address this limitation, we propose the Meta-Policy Controller (MPC), a dynamic meta-learning framework that adjusts the KL divergence coefficient and Dirichlet prior strengths for optimal uncertainty modeling. Specifically, MPC employs a bi-level optimization approach: in the inner loop, model parameters are updated through a dynamically configured loss function that adapts to the current training state; in the outer loop, a policy network optimizes the KL divergence coefficient and class-specific Dirichlet prior strengths based on multi-objective rewards balancing prediction accuracy and uncertainty quality. Unlike previous methods with fixed priors, our learnable Dirichlet prior enables flexible adaptation to class distributions and training dynamics. Extensive experimental results show that MPC significantly enhances the reliability and calibration of model predictions across various tasks, improving uncertainty calibration, prediction accuracy, and performance retention after confidence-based sample rejection.

Bi-level Meta-Policy Control for Dynamic Uncertainty Calibration in Evidential Deep Learning

TL;DR

This work tackles the challenge of static uncertainty calibration in Evidential Deep Learning (EDL) when data distributions shift. It introduces a Meta-Policy Controller (MPC), a bi-level meta-learning framework where a state-aware policy dynamically tunes the KL regularization coefficient and class-wise Dirichlet priors to optimize uncertainty modelling. The inner loop trains the backbone with a dynamically configured , while the outer loop uses multi-objective rewards (ACC, ECE, MUE) to update the policy via policy gradients, with a learnable Dirichlet prior improving adaptability to class distributions. The approach yields improved uncertainty calibration, prediction accuracy, and robustness to distribution shifts, including OOD and long-tailed regimes, and shows promise for real-world, high-stakes applications.

Abstract

Traditional Evidence Deep Learning (EDL) methods rely on static hyperparameter for uncertainty calibration, limiting their adaptability in dynamic data distributions, which results in poor calibration and generalization in high-risk decision-making tasks. To address this limitation, we propose the Meta-Policy Controller (MPC), a dynamic meta-learning framework that adjusts the KL divergence coefficient and Dirichlet prior strengths for optimal uncertainty modeling. Specifically, MPC employs a bi-level optimization approach: in the inner loop, model parameters are updated through a dynamically configured loss function that adapts to the current training state; in the outer loop, a policy network optimizes the KL divergence coefficient and class-specific Dirichlet prior strengths based on multi-objective rewards balancing prediction accuracy and uncertainty quality. Unlike previous methods with fixed priors, our learnable Dirichlet prior enables flexible adaptation to class distributions and training dynamics. Extensive experimental results show that MPC significantly enhances the reliability and calibration of model predictions across various tasks, improving uncertainty calibration, prediction accuracy, and performance retention after confidence-based sample rejection.

Paper Structure

This paper contains 23 sections, 16 equations, 3 figures, 11 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of our bilevel Meta-Policy Controller (MPC) framework. The inner loop updates the evidential model using a loss constructed with dynamically selected KL coefficient $\lambda_t$ and Dirichlet prior $\alpha_{0,t}$ from the policy. The outer loop updates the policy using reward feedback calculated from Accuracy, ECE, and MUE metrics.$\phi$,$\theta$ represents the model parameters respectively
  • Figure 2: Performance curves (mean and standard deviation) under different KL coefficients on MNIST, CIFAR-10, and SVHN datasets. Dynamic trends show dataset-specific preferences for KL settings, motivating adaptive strategies.
  • Figure 3: Correlation between hyperparameter dynamics (KL coefficient and Dirichlet prior strength) and performance metrics during training on SVHN. Each dot represents one epoch.