Table of Contents
Fetching ...

Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints

Zilin Kang, Chonghua Liao, Tingqiang Xu, Huazhe Xu

TL;DR

Entropy Regularizing Activation (ERA) presents a universal, activation-based mechanism to enforce a target entropy on model outputs without altering the primary objective. By transforming final outputs through task-specific activations, ERA achieves provable entropy guarantees across continuous control, discrete classification, and large language model reinforcement learning, while incurring modest overhead (~7%). Empirically, ERA yields substantial gains in continuous control (SAC, PPO, TD-MPC2, FastSAC), image classification (ImageNet, CIFAR-10), and LLM reasoning benchmarks (AIME, AMC) and improves out-of-distribution generalization. The approach highlights output activations as a powerful, non-invasive tool for entropy control, offering a scalable path to more robust and generalizable learning systems.

Abstract

We propose ERA, a new paradigm that constrains the sampling entropy above given thresholds by applying specially designed activations to the outputs of models. Our approach demonstrates broad effectiveness across different domains: 1) for large language models(LLMs), boosting the AIME 2025 score for Qwen2.5-Math-7B by 37.4%; 2) for continuous control reinforcement learning agents, improving performance by more than 30% over strong baselines such as SAC on the challenging HumanoidBench; 3) for image classification, enhancing ImageNet top-1 accuracy by 0.69% for ResNet-50. These gains are achieved with a computational overhead of less than 7%. Our work validates output activation as a powerful tool for entropy control, opening a new direction for designing simpler and more robust algorithms.

Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints

TL;DR

Entropy Regularizing Activation (ERA) presents a universal, activation-based mechanism to enforce a target entropy on model outputs without altering the primary objective. By transforming final outputs through task-specific activations, ERA achieves provable entropy guarantees across continuous control, discrete classification, and large language model reinforcement learning, while incurring modest overhead (~7%). Empirically, ERA yields substantial gains in continuous control (SAC, PPO, TD-MPC2, FastSAC), image classification (ImageNet, CIFAR-10), and LLM reasoning benchmarks (AIME, AMC) and improves out-of-distribution generalization. The approach highlights output activations as a powerful, non-invasive tool for entropy control, offering a scalable path to more robust and generalizable learning systems.

Abstract

We propose ERA, a new paradigm that constrains the sampling entropy above given thresholds by applying specially designed activations to the outputs of models. Our approach demonstrates broad effectiveness across different domains: 1) for large language models(LLMs), boosting the AIME 2025 score for Qwen2.5-Math-7B by 37.4%; 2) for continuous control reinforcement learning agents, improving performance by more than 30% over strong baselines such as SAC on the challenging HumanoidBench; 3) for image classification, enhancing ImageNet top-1 accuracy by 0.69% for ResNet-50. These gains are achieved with a computational overhead of less than 7%. Our work validates output activation as a powerful tool for entropy control, opening a new direction for designing simpler and more robust algorithms.

Paper Structure

This paper contains 55 sections, 4 theorems, 46 equations, 26 figures, 6 tables, 1 algorithm.

Key Result

Proposition 1

Given a target entropy $\mathcal{H}_0$ and a residual entropy $\hat{\delta} \geq \delta$, the policy defined by Eq. eq:continuous_era has entropy $\mathcal{H}(\pi) \geq \mathcal{H}_0$, and $\sigma'$ is bounded within $[\sigma_{\min}, \sigma_{\max}]$.

Figures (26)

  • Figure 1: ERA Boosts Large Language Models, Continuous Control and Image Classification. (a) Large Language Models: ERA consistently enhances the performance of Qwen-2.5-Math-7B on AIME'24,AIME'25 and AMC datasets. (b) Continuous Control: ERA significantly improves multiple popular RL algorithms, including SAC, PPO, TD-MPC2 and OBAC. (c) Image Classification: ERA consistently boosts the performance of ResNet-50 on ImageNet and CIFAR-10 datasets.
  • Figure 2: Main Results of ERA in Continuous Control. Aggregate normalized performance on HumanoidBench (6 tasks, with SAC), DMC (Humanoid & Dog) (6 tasks, with TD-MPC2), HumanoidBench (8 tasks, with FastSAC) and Mujoco Gym (4 tasks, with PPO). ERA consistently accelerates learning and achieves superior asymptotic performance.
  • Figure 3: Sensitivity of ERA to the Minimum Entropy. (a) 1M Steps Performance on DMC Tasks. Comparison between SAC-ERA and the baseline SAC on Humanoid and Dogs environments under various minimum entropy constraints. Our method achieves superior performance across all settings. (b) Accuracy on ImageNet and CIFAR-10. ResNet-ERA maintains stable Top-1 and Top-5 accuracy across a range of minimum entropy values, indicating its robustness to the choice of this hyperparameter.
  • Figure 4: Entropy comparison and pass@$k$ results for GRPO with ERA (ours) versus GRPO alone. The entropy curves demonstrate that ERA mitigates entropy collapse and establishes a clear lower bound. The pass@$k$ results further indicate that ERA enhances exploration and strengthens the model’s reasoning ability.
  • Figure 5: Results on three OOD benchmarks (Qwen2.5-Math-7B).
  • ...and 21 more figures

Theorems & Definitions (8)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Lemma 1
  • Definition 1
  • Proposition 3
  • proof