Table of Contents
Fetching ...

BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts

Divya Jyoti Bajpai, Manjesh Kumar Hanawal

TL;DR

BEEM tackles latency-accuracy trade-offs in early exit DNNs by treating each intermediate exit as an expert and aggregating their confidence through a weighted ensemble with neighbor-consistency: exits accumulate a weighted score $S_i$ only when neighboring predictions agree, and an exit occurs when $S_i\geq α$. The method uses two weighting schemes, BEEM-C (cost-based) and BEEM-A (accuracy-based), and determines exit thresholds via optimization that leverages exit error rates to outperform the final classifier, supported by a theoretical bound on BEEM's error rate. Empirical results on GLUE and COCO show consistent speedups of $1.5\times$–$2.1\times$ with accuracy near or above the final layer, especially for easier NLP tasks; BEEM-A often yields the best trade-offs. The work demonstrates that incorporating ensemble-like information from multiple exits and carefully selecting thresholds can substantially improve the practicality of early exit DNNs in real-world NLP and vision-language tasks, with scalable benefits for larger models.

Abstract

Early Exit (EE) techniques have emerged as a means to reduce inference latency in Deep Neural Networks (DNNs). The latency improvement and accuracy in these techniques crucially depend on the criteria used to make exit decisions. We propose a new decision criterion where exit classifiers are treated as experts BEEM and aggregate their confidence scores. The confidence scores are aggregated only if neighbouring experts are consistent in prediction as the samples pass through them, thus capturing their ensemble effect. A sample exits when the aggregated confidence value exceeds a threshold. The threshold is set using the error rates of the intermediate exits aiming to surpass the performance of conventional DNN inference. Experimental results on the COCO dataset for Image captioning and GLUE datasets for various language tasks demonstrate that our method enhances the performance of state-of-the-art EE methods, achieving improvements in speed-up by a factor 1.5x to 2.1x. When compared to the final layer, its accuracy is comparable in harder Image Captioning and improves in the easier language tasks. The source code for this work is publicly available at https://github.com/Div290/BEEM1/tree/main

BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts

TL;DR

BEEM tackles latency-accuracy trade-offs in early exit DNNs by treating each intermediate exit as an expert and aggregating their confidence through a weighted ensemble with neighbor-consistency: exits accumulate a weighted score only when neighboring predictions agree, and an exit occurs when . The method uses two weighting schemes, BEEM-C (cost-based) and BEEM-A (accuracy-based), and determines exit thresholds via optimization that leverages exit error rates to outperform the final classifier, supported by a theoretical bound on BEEM's error rate. Empirical results on GLUE and COCO show consistent speedups of with accuracy near or above the final layer, especially for easier NLP tasks; BEEM-A often yields the best trade-offs. The work demonstrates that incorporating ensemble-like information from multiple exits and carefully selecting thresholds can substantially improve the practicality of early exit DNNs in real-world NLP and vision-language tasks, with scalable benefits for larger models.

Abstract

Early Exit (EE) techniques have emerged as a means to reduce inference latency in Deep Neural Networks (DNNs). The latency improvement and accuracy in these techniques crucially depend on the criteria used to make exit decisions. We propose a new decision criterion where exit classifiers are treated as experts BEEM and aggregate their confidence scores. The confidence scores are aggregated only if neighbouring experts are consistent in prediction as the samples pass through them, thus capturing their ensemble effect. A sample exits when the aggregated confidence value exceeds a threshold. The threshold is set using the error rates of the intermediate exits aiming to surpass the performance of conventional DNN inference. Experimental results on the COCO dataset for Image captioning and GLUE datasets for various language tasks demonstrate that our method enhances the performance of state-of-the-art EE methods, achieving improvements in speed-up by a factor 1.5x to 2.1x. When compared to the final layer, its accuracy is comparable in harder Image Captioning and improves in the easier language tasks. The source code for this work is publicly available at https://github.com/Div290/BEEM1/tree/main

Paper Structure

This paper contains 20 sections, 1 theorem, 15 equations, 2 figures, 5 tables.

Key Result

Theorem 3.1

Consider an early exit PLM with $L$ layers. Let $p$ denote the error rate of the final classifier and the error probability of $i$th exit classifiers be $q_i$ such that $q_i<\frac{a_i}{a_i+((1/p-1)b_i^{i-1})}$ holds for all exit layers $i=1,2,\ldots, L-1$ where $a_i$ and $b_i$ are constants for a gi

Figures (2)

  • Figure 1: Comparison between (a) DeeBERT, which uses the confidence available at each exit as the metric or deciding early inference (set to 0.9), (b) PABEE, which uses the consistency in prediction as the confidence metric (set to 2) and (c) BEEM that uses the weighted confidence $S_i$ (weights = $[0.1, 0.2, \ldots, 1.2]$) and threshold $\alpha = 0.2$. In BEEM, by appropriately considering information from previous classifiers, a correct prediction is made early which was not the case with others.
  • Figure :

Theorems & Definitions (1)

  • Theorem 3.1