Table of Contents
Fetching ...

COSEE: Consistency-Oriented Signal-Based Early Exiting via Calibrated Sample Weighting Mechanism

Jianing He, Qi Zhang, Hongyun Zhang, Xuanjing Huang, Usman Naseem, Duoqian Miao

TL;DR

COSEE addresses the training-testing mismatch in signal-based early exiting for pre-trained language models and enables flexible speed-up without sacrificing accuracy. It introduces a calibrated sample weighting mechanism (SWM) to bias each classifier toward samples likely to exit there, and an online signal calibration (OSC) objective to sharpen exiting signals, using a normalized energy score to stabilize thresholding. By simulating multiple thresholds during training and minimizing the mean cross-entropy across exits, COSEE aligns training with diverse inference-time accelerations and maintains consistency between training and testing. Experiments on GLUE with a BERT-base backbone demonstrate superior accuracy-throughput trade-offs, faster convergence, and negligible additional storage, with good generalization across exiting signals and backbones. This work offers a practical, scalable approach to deploying efficient, consistency-aware multi-exit PLMs in resource-constrained environments.

Abstract

Early exiting is an effective paradigm for improving the inference efficiency of pre-trained language models (PLMs) by dynamically adjusting the number of executed layers for each sample. However, in most existing works, easy and hard samples are treated equally by each classifier during training, which neglects the test-time early exiting behavior, leading to inconsistency between training and testing. Although some methods have tackled this issue under a fixed speed-up ratio, the challenge of flexibly adjusting the speed-up ratio while maintaining consistency between training and testing is still under-explored. To bridge the gap, we propose a novel Consistency-Oriented Signal-based Early Exiting (COSEE) framework, which leverages a calibrated sample weighting mechanism to enable each classifier to emphasize the samples that are more likely to exit at that classifier under various acceleration scenarios. Extensive experiments on the GLUE benchmark demonstrate the effectiveness of our COSEE across multiple exiting signals and backbones, yielding a better trade-off between performance and efficiency.

COSEE: Consistency-Oriented Signal-Based Early Exiting via Calibrated Sample Weighting Mechanism

TL;DR

COSEE addresses the training-testing mismatch in signal-based early exiting for pre-trained language models and enables flexible speed-up without sacrificing accuracy. It introduces a calibrated sample weighting mechanism (SWM) to bias each classifier toward samples likely to exit there, and an online signal calibration (OSC) objective to sharpen exiting signals, using a normalized energy score to stabilize thresholding. By simulating multiple thresholds during training and minimizing the mean cross-entropy across exits, COSEE aligns training with diverse inference-time accelerations and maintains consistency between training and testing. Experiments on GLUE with a BERT-base backbone demonstrate superior accuracy-throughput trade-offs, faster convergence, and negligible additional storage, with good generalization across exiting signals and backbones. This work offers a practical, scalable approach to deploying efficient, consistency-aware multi-exit PLMs in resource-constrained environments.

Abstract

Early exiting is an effective paradigm for improving the inference efficiency of pre-trained language models (PLMs) by dynamically adjusting the number of executed layers for each sample. However, in most existing works, easy and hard samples are treated equally by each classifier during training, which neglects the test-time early exiting behavior, leading to inconsistency between training and testing. Although some methods have tackled this issue under a fixed speed-up ratio, the challenge of flexibly adjusting the speed-up ratio while maintaining consistency between training and testing is still under-explored. To bridge the gap, we propose a novel Consistency-Oriented Signal-based Early Exiting (COSEE) framework, which leverages a calibrated sample weighting mechanism to enable each classifier to emphasize the samples that are more likely to exit at that classifier under various acceleration scenarios. Extensive experiments on the GLUE benchmark demonstrate the effectiveness of our COSEE across multiple exiting signals and backbones, yielding a better trade-off between performance and efficiency.

Paper Structure

This paper contains 30 sections, 10 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Exiting layer distribution on the QNLI development set with entropy-based exiting signal (Threshold = 0.4). Neg and Pos denote negative and positive samples, respectively. Samples near the classification boundary (hard samples) tend to exit at deep classifiers, while samples far from the classification boundary (easy samples) typically exit at shallow classifiers.
  • Figure 2: Comparison between the conventional signal-based early exiting framework and our COSEE. The conventional framework simply minimizes the (weighted) sum of cross-entropy losses from all classifiers, where each classifier treats all samples equally during training. Instead, our COSEE enables each classifier to emphasize samples that are more likely to exit at that classifier, ensuring consistency between training and testing. We also incorporate an online signal calibration objective $\rm Loss_{\rm OSC}$ for each internal classifier to encourage highly discriminative exiting signals for more reliable exiting decisions and loss weights.
  • Figure 3: Energy distribution across layers 2, 6, and 10 for the SST-2 task. Normalization aligns the energy distribution across layers, facilitating threshold selection.
  • Figure 4: Impact of SWM and OSC on the trade-off between performance and efficiency for COSEE with energy on four GLUE development sets.
  • Figure 5: DIS heatmap of different models at different layers on the development sets of SST-2 and QNLI. Both SWM and OSC can encourage more discriminative exiting signals, further strengthening the reliability of exiting decisions.
  • ...and 6 more figures