Table of Contents
Fetching ...

CEEBERT: Cross-Domain Inference in Early Exit BERT

Divya Jyoti Bajpai, Manjesh Kumar Hanawal

TL;DR

This paper tackles high inference latency in large PLMs by enabling early exits that adapt to cross-domain input distributions. It introduces CeeBERT, an online, unsupervised threshold learner that treats exit decisions as a Multi-Armed Bandit problem and uses a UCB-based policy to select exit thresholds on the target data without labels. The approach yields latency reductions of roughly $2\times$ to $3.5\times$ with minimal accuracy loss ($0.1\%$--$3\%$) across five target datasets and two backbones (BERT/ALBERT), while establishing a sub-linear regret bound. This cross-domain adaptability eliminates the need for target-domain labeled data or extensive re-training, boosting practical scalability of early exit PLMs in real-world deployments, especially under domain shift.

Abstract

Pre-trained Language Models (PLMs), like BERT, with self-supervision objectives exhibit remarkable performance and generalization across various tasks. However, they suffer in inference latency due to their large size. To address this issue, side branches are attached at intermediate layers, enabling early inference of samples without requiring them to pass through all layers. However, the challenge is to decide which layer to infer and exit each sample so that the accuracy and latency are balanced. Moreover, the distribution of the samples to be inferred may differ from that used for training necessitating cross-domain adaptation. We propose an online learning algorithm named Cross-Domain Inference in Early Exit BERT (CeeBERT) that dynamically determines early exits of samples based on the level of confidence at each exit point. CeeBERT learns optimal thresholds from domain-specific confidence observed at intermediate layers on the fly, eliminating the need for labeled data. Experimental results on five distinct datasets with BERT and ALBERT models demonstrate CeeBERT's ability to improve latency by reducing unnecessary computations with minimal drop in performance. By adapting to the threshold values, CeeBERT can speed up the BERT/ALBERT models by $2\times$ - $3.5\times$ with minimal drop in accuracy.

CEEBERT: Cross-Domain Inference in Early Exit BERT

TL;DR

This paper tackles high inference latency in large PLMs by enabling early exits that adapt to cross-domain input distributions. It introduces CeeBERT, an online, unsupervised threshold learner that treats exit decisions as a Multi-Armed Bandit problem and uses a UCB-based policy to select exit thresholds on the target data without labels. The approach yields latency reductions of roughly to with minimal accuracy loss (--) across five target datasets and two backbones (BERT/ALBERT), while establishing a sub-linear regret bound. This cross-domain adaptability eliminates the need for target-domain labeled data or extensive re-training, boosting practical scalability of early exit PLMs in real-world deployments, especially under domain shift.

Abstract

Pre-trained Language Models (PLMs), like BERT, with self-supervision objectives exhibit remarkable performance and generalization across various tasks. However, they suffer in inference latency due to their large size. To address this issue, side branches are attached at intermediate layers, enabling early inference of samples without requiring them to pass through all layers. However, the challenge is to decide which layer to infer and exit each sample so that the accuracy and latency are balanced. Moreover, the distribution of the samples to be inferred may differ from that used for training necessitating cross-domain adaptation. We propose an online learning algorithm named Cross-Domain Inference in Early Exit BERT (CeeBERT) that dynamically determines early exits of samples based on the level of confidence at each exit point. CeeBERT learns optimal thresholds from domain-specific confidence observed at intermediate layers on the fly, eliminating the need for labeled data. Experimental results on five distinct datasets with BERT and ALBERT models demonstrate CeeBERT's ability to improve latency by reducing unnecessary computations with minimal drop in performance. By adapting to the threshold values, CeeBERT can speed up the BERT/ALBERT models by - with minimal drop in accuracy.
Paper Structure (22 sections, 1 theorem, 7 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 22 sections, 1 theorem, 7 equations, 3 figures, 4 tables, 1 algorithm.

Key Result

Theorem A.1

For any $\gamma>1$, the regret of CeeBERT with $K$ arms in the action set after $n$ rounds is given as: where $\Delta_\alpha = r(\alpha^{*})-r(\alpha)$

Figures (3)

  • Figure 1: Depiction of cross-domain inference setup for early exit models. The backbone is trained on the source dataset. i) Left: Inference on the source dataset. ii) Right: Inference on target dataset from a different domain. The changes in confidence value is due to change in target domain distribution.
  • Figure 2: Left: We show the trade-off between accuracy and speedup by changing the tunable parameters for various methods. Center: figure states the change in the distribution of confidence at the intermediate exits when the dataset distribution changes. The backbone was trained on SST-2 in this case. Right: figure shows the cumulative regret observed by CeeBERT on the Yelp dataset showing that CeeBERT achieves sub-linear regret.
  • Figure 3: Cumulative regret curves for different datasets.

Theorems & Definitions (1)

  • Theorem A.1