Table of Contents
Fetching ...

Alignment Adapter to Improve the Performance of Compressed Deep Learning Models

Rohit Raj Rai, Abhishek Dhaka, Amit Awekar

TL;DR

Alignment Adapter is proposed: a lightweight, sliding-window-based adapter that aligns the token-level embeddings of a compressed model with those of the original large model, and significantly boosts the performance of compressed models with only marginal overhead in size and latency.

Abstract

Compressed Deep Learning (DL) models are essential for deployment in resource-constrained environments. But their performance often lags behind their large-scale counterparts. To bridge this gap, we propose Alignment Adapter (AlAd): a lightweight, sliding-window-based adapter. It aligns the token-level embeddings of a compressed model with those of the original large model. AlAd preserves local contextual semantics, enables flexible alignment across differing dimensionalities or architectures, and is entirely agnostic to the underlying compression method. AlAd can be deployed in two ways: as a plug-and-play module over a frozen compressed model, or by jointly fine-tuning AlAd with the compressed model for further performance gains. Through experiments on BERT-family models across three token-level NLP tasks, we demonstrate that AlAd significantly boosts the performance of compressed models with only marginal overhead in size and latency.

Alignment Adapter to Improve the Performance of Compressed Deep Learning Models

TL;DR

Alignment Adapter is proposed: a lightweight, sliding-window-based adapter that aligns the token-level embeddings of a compressed model with those of the original large model, and significantly boosts the performance of compressed models with only marginal overhead in size and latency.

Abstract

Compressed Deep Learning (DL) models are essential for deployment in resource-constrained environments. But their performance often lags behind their large-scale counterparts. To bridge this gap, we propose Alignment Adapter (AlAd): a lightweight, sliding-window-based adapter. It aligns the token-level embeddings of a compressed model with those of the original large model. AlAd preserves local contextual semantics, enables flexible alignment across differing dimensionalities or architectures, and is entirely agnostic to the underlying compression method. AlAd can be deployed in two ways: as a plug-and-play module over a frozen compressed model, or by jointly fine-tuning AlAd with the compressed model for further performance gains. Through experiments on BERT-family models across three token-level NLP tasks, we demonstrate that AlAd significantly boosts the performance of compressed models with only marginal overhead in size and latency.
Paper Structure (6 sections, 1 equation, 1 figure, 1 table)

This paper contains 6 sections, 1 equation, 1 figure, 1 table.

Figures (1)

  • Figure 1: Overview of the Alignment Adapter (AlAd) approach. $M_C$ is a compressed model. $M_L$ is a large model. Both receive input tokens $W_1$ to $W_M$. AlAd transforms embeddings of $M_C$ from $R_C$ to $R'_C$ to better align with $R_L$ (embeddings of $M_L$). While computing the transform function $\mathbb{F}$, AlAd uses a window size of $n$ to utilize the context around each token.