Confidence-gated training for efficient early-exit neural networks
Saad Mokssit, Ouassim Karrakchou, Alejandro Mousist, Mounir Ghogho
TL;DR
This paper tackles gradient interference in early-exit neural networks by introducing Confidence-Gated Training (CGT), which replaces fixed exit weights with sample-dependent weights to better align training with inference-time decisions. It presents two instantiations, HardCGT (binary gating) and SoftCGT (residual gating), enabling per-sample gradient flow that prioritizes shallow exits for easy inputs while preserving deeper exits for harder ones. The joint CGT objective, $\\mathcal{L}_{CGT} = \\frac{1}{N} \\sum_{i=1}^N \\sum_{e=1}^E \\lambda_e^{(i)} \\\ell(\\hat{\\boldsymbol{p}}_e^{(i)}, y_i)$, underpins the method, with explicit gating rules for both HardCGT and SoftCGT. Empirical results on Indian Pines and Fashion-MNIST show improved accuracy and reduced average inference cost compared to baselines, demonstrating CGT’s practical value for resource-constrained deployments.
Abstract
Early-exit neural networks reduce inference cost by enabling confident predictions at intermediate layers. However, joint training often leads to gradient interference, with deeper classifiers dominating optimization. We propose Confidence-Gated Training (CGT), a paradigm that conditionally propagates gradients from deeper exits only when preceding exits fail. This encourages shallow classifiers to act as primary decision points while reserving deeper layers for harder inputs. By aligning training with the inference-time policy, CGT mitigates overthinking, improves early-exit accuracy, and preserves efficiency. Experiments on the Indian Pines and Fashion-MNIST benchmarks show that CGT lowers average inference cost while improving overall accuracy, offering a practical solution for deploying deep models in resource-constrained environments.
