Table of Contents
Fetching ...

Multi-Attribute Multi-Grained Adaptation of Pre-Trained Language Models for Text Understanding from Bayesian Perspective

You Zhang, Jin Wang, Liang-Chih Yu, Dan Xu, Xuejie Zhang

TL;DR

This work tackles the challenge of non-IID data in text understanding by rethinking PLM adaptation through a Bayesian lens. It introduces M2A, a multi-attribute, multi-grained adaptation framework that ensembles IID and non-IID information using lightweight LoRA-based modules and a joint learning objective that combines multitask optimization with distillation. A KronA-based decomposition further enhances parameter efficiency for fine-grained views, while a Bayesian training scheme leverages both $p(\mathcal{D}_y|\mathcal{D}_x; w)$ and $p(\mathcal{D}_x|w)$ to model data heterogeneity. Empirical results on multi-domain sentiment and personalized sentiment datasets show that M2A consistently outperforms strong baselines, especially as data heterogeneity grows and PLMs scale. The work suggests broad applicability for robust PLM adaptation and points to future directions in richer multi-view data and automated heterogeneity detection.

Abstract

Current neural networks often employ multi-domain-learning or attribute-injecting mechanisms to incorporate non-independent and identically distributed (non-IID) information for text understanding tasks by capturing individual characteristics and the relationships among samples. However, the extent of the impact of non-IID information and how these methods affect pre-trained language models (PLMs) remains unclear. This study revisits the assumption that non-IID information enhances PLMs to achieve performance improvements from a Bayesian perspective, which unearths and integrates non-IID and IID features. Furthermore, we proposed a multi-attribute multi-grained framework for PLM adaptations (M2A), which combines multi-attribute and multi-grained views to mitigate uncertainty in a lightweight manner. We evaluate M2A through prevalent text-understanding datasets and demonstrate its superior performance, mainly when data are implicitly non-IID, and PLMs scale larger.

Multi-Attribute Multi-Grained Adaptation of Pre-Trained Language Models for Text Understanding from Bayesian Perspective

TL;DR

This work tackles the challenge of non-IID data in text understanding by rethinking PLM adaptation through a Bayesian lens. It introduces M2A, a multi-attribute, multi-grained adaptation framework that ensembles IID and non-IID information using lightweight LoRA-based modules and a joint learning objective that combines multitask optimization with distillation. A KronA-based decomposition further enhances parameter efficiency for fine-grained views, while a Bayesian training scheme leverages both and to model data heterogeneity. Empirical results on multi-domain sentiment and personalized sentiment datasets show that M2A consistently outperforms strong baselines, especially as data heterogeneity grows and PLMs scale. The work suggests broad applicability for robust PLM adaptation and points to future directions in richer multi-view data and automated heterogeneity detection.

Abstract

Current neural networks often employ multi-domain-learning or attribute-injecting mechanisms to incorporate non-independent and identically distributed (non-IID) information for text understanding tasks by capturing individual characteristics and the relationships among samples. However, the extent of the impact of non-IID information and how these methods affect pre-trained language models (PLMs) remains unclear. This study revisits the assumption that non-IID information enhances PLMs to achieve performance improvements from a Bayesian perspective, which unearths and integrates non-IID and IID features. Furthermore, we proposed a multi-attribute multi-grained framework for PLM adaptations (M2A), which combines multi-attribute and multi-grained views to mitigate uncertainty in a lightweight manner. We evaluate M2A through prevalent text-understanding datasets and demonstrate its superior performance, mainly when data are implicitly non-IID, and PLMs scale larger.

Paper Structure

This paper contains 26 sections, 19 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: A conception of the proposed method.
  • Figure 2: Overview of the M2A Framework.
  • Figure 3: Dev Acc of R-M2A with different multi-task.