Decoupled Entropy Minimization
Jing Ma, Hanlin Li, Xiang Xiang
TL;DR
This paper addresses limitations of Entropy Minimization (EM) by reformulating the conditional entropy $H(\mathbf{z})$ into a decoupled sum of a Cluster Aggregation Driving Factor (CADF) and a Gradient Mitigation Calibrator (GMC), expressed as $H(\mathbf{z}) = T_\tau(\mathbf{z}) + Q_\alpha(\mathbf{z})$. It identifies reward collapse and easy-class bias arising from the coupled EM and analyzes them through the CADF/GMC lens. To overcome these issues, it introduces Adaptive Decoupled Entropy Minimization (AdaDEM): normalizing the CADF reward with $\delta = \| - \partial T(\mathbf{z})/\partial \mathbf{z} \|_1$ and replacing GMC with Marginal Entropy Calibrator (MEC), a hyperparameter-free, dynamically estimated prior that mitigates easy-class bias. Across SSL, TTA, UDA, and RL tasks, AdaDEM achieves superior performance and robustness compared to DEM*, demonstrating EM’s potential when decoupled and adaptively regularized.
Abstract
Entropy Minimization (EM) is beneficial to reducing class overlap, bridging domain gap, and restricting uncertainty for various tasks in machine learning, yet its potential is limited. To study the internal mechanism of EM, we reformulate and decouple the classical EM into two parts with opposite effects: cluster aggregation driving factor (CADF) rewards dominant classes and prompts a peaked output distribution, while gradient mitigation calibrator (GMC) penalizes high-confidence classes based on predicted probabilities. Furthermore, we reveal the limitations of classical EM caused by its coupled formulation: 1) reward collapse impedes the contribution of high-certainty samples in the learning process, and 2) easy-class bias induces misalignment between output distribution and label distribution. To address these issues, we propose Adaptive Decoupled Entropy Minimization (AdaDEM), which normalizes the reward brought from CADF and employs a marginal entropy calibrator (MEC) to replace GMC. AdaDEM outperforms DEM*, an upper-bound variant of classical EM, and achieves superior performance across various imperfectly supervised learning tasks in noisy and dynamic environments.
