COSEE: Consistency-Oriented Signal-Based Early Exiting via Calibrated Sample Weighting Mechanism
Jianing He, Qi Zhang, Hongyun Zhang, Xuanjing Huang, Usman Naseem, Duoqian Miao
TL;DR
COSEE addresses the training-testing mismatch in signal-based early exiting for pre-trained language models and enables flexible speed-up without sacrificing accuracy. It introduces a calibrated sample weighting mechanism (SWM) to bias each classifier toward samples likely to exit there, and an online signal calibration (OSC) objective to sharpen exiting signals, using a normalized energy score to stabilize thresholding. By simulating multiple thresholds during training and minimizing the mean cross-entropy across exits, COSEE aligns training with diverse inference-time accelerations and maintains consistency between training and testing. Experiments on GLUE with a BERT-base backbone demonstrate superior accuracy-throughput trade-offs, faster convergence, and negligible additional storage, with good generalization across exiting signals and backbones. This work offers a practical, scalable approach to deploying efficient, consistency-aware multi-exit PLMs in resource-constrained environments.
Abstract
Early exiting is an effective paradigm for improving the inference efficiency of pre-trained language models (PLMs) by dynamically adjusting the number of executed layers for each sample. However, in most existing works, easy and hard samples are treated equally by each classifier during training, which neglects the test-time early exiting behavior, leading to inconsistency between training and testing. Although some methods have tackled this issue under a fixed speed-up ratio, the challenge of flexibly adjusting the speed-up ratio while maintaining consistency between training and testing is still under-explored. To bridge the gap, we propose a novel Consistency-Oriented Signal-based Early Exiting (COSEE) framework, which leverages a calibrated sample weighting mechanism to enable each classifier to emphasize the samples that are more likely to exit at that classifier under various acceleration scenarios. Extensive experiments on the GLUE benchmark demonstrate the effectiveness of our COSEE across multiple exiting signals and backbones, yielding a better trade-off between performance and efficiency.
