Table of Contents
Fetching ...

Confidence-gated training for efficient early-exit neural networks

Saad Mokssit, Ouassim Karrakchou, Alejandro Mousist, Mounir Ghogho

TL;DR

This paper tackles gradient interference in early-exit neural networks by introducing Confidence-Gated Training (CGT), which replaces fixed exit weights with sample-dependent weights to better align training with inference-time decisions. It presents two instantiations, HardCGT (binary gating) and SoftCGT (residual gating), enabling per-sample gradient flow that prioritizes shallow exits for easy inputs while preserving deeper exits for harder ones. The joint CGT objective, $\\mathcal{L}_{CGT} = \\frac{1}{N} \\sum_{i=1}^N \\sum_{e=1}^E \\lambda_e^{(i)} \\\ell(\\hat{\\boldsymbol{p}}_e^{(i)}, y_i)$, underpins the method, with explicit gating rules for both HardCGT and SoftCGT. Empirical results on Indian Pines and Fashion-MNIST show improved accuracy and reduced average inference cost compared to baselines, demonstrating CGT’s practical value for resource-constrained deployments.

Abstract

Early-exit neural networks reduce inference cost by enabling confident predictions at intermediate layers. However, joint training often leads to gradient interference, with deeper classifiers dominating optimization. We propose Confidence-Gated Training (CGT), a paradigm that conditionally propagates gradients from deeper exits only when preceding exits fail. This encourages shallow classifiers to act as primary decision points while reserving deeper layers for harder inputs. By aligning training with the inference-time policy, CGT mitigates overthinking, improves early-exit accuracy, and preserves efficiency. Experiments on the Indian Pines and Fashion-MNIST benchmarks show that CGT lowers average inference cost while improving overall accuracy, offering a practical solution for deploying deep models in resource-constrained environments.

Confidence-gated training for efficient early-exit neural networks

TL;DR

This paper tackles gradient interference in early-exit neural networks by introducing Confidence-Gated Training (CGT), which replaces fixed exit weights with sample-dependent weights to better align training with inference-time decisions. It presents two instantiations, HardCGT (binary gating) and SoftCGT (residual gating), enabling per-sample gradient flow that prioritizes shallow exits for easy inputs while preserving deeper exits for harder ones. The joint CGT objective, , underpins the method, with explicit gating rules for both HardCGT and SoftCGT. Empirical results on Indian Pines and Fashion-MNIST show improved accuracy and reduced average inference cost compared to baselines, demonstrating CGT’s practical value for resource-constrained deployments.

Abstract

Early-exit neural networks reduce inference cost by enabling confident predictions at intermediate layers. However, joint training often leads to gradient interference, with deeper classifiers dominating optimization. We propose Confidence-Gated Training (CGT), a paradigm that conditionally propagates gradients from deeper exits only when preceding exits fail. This encourages shallow classifiers to act as primary decision points while reserving deeper layers for harder inputs. By aligning training with the inference-time policy, CGT mitigates overthinking, improves early-exit accuracy, and preserves efficiency. Experiments on the Indian Pines and Fashion-MNIST benchmarks show that CGT lowers average inference cost while improving overall accuracy, offering a practical solution for deploying deep models in resource-constrained environments.

Paper Structure

This paper contains 8 sections, 7 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Early exit DNN. The input flows through successive backbone blocks. At each block a head outputs a prediction and confidence. If the confidence exceeds the threshold, inference terminates, otherwise the features continue to the next block.
  • Figure 2: Per exit training loss under HardCGT
  • Figure 3: Per exit training loss under SoftCGT.