Table of Contents
Fetching ...

Multi-BERT: Leveraging Adapters and Prompt Tuning for Low-Resource Multi-Domain Adaptation

Parham Abed Azad, Hamid Beigy

TL;DR

This work tackles multi-domain Persian NER by proposing multi-Bert, a single frozen core model augmented with domain-specific adapters and task-specific headers, enabling per-domain outputs without training separate full models. It explores two adaptation mechanisms—prefix-tuning and LoRA—to inject domain knowledge while preserving the shared core, and introduces a document-based pipeline to handle unknown domains. Across formal, informal, and noisy multi-domain datasets, multi-Bert with prompt-tuning achieves state-of-the-art or near-state-of-the-art performance, with prompt-tuning outperforming LoRA in low-resource settings. The approach offers practical benefits in training efficiency and storage, and a deployable domain-detection step enhances applicability to real-world text streams.

Abstract

The rapid expansion of texts' volume and diversity presents formidable challenges in multi-domain settings. These challenges are also visible in the Persian name entity recognition (NER) settings. Traditional approaches, either employing a unified model for multiple domains or individual models for each domain, frequently pose significant limitations. Single models often struggle to capture the nuances of diverse domains, while utilizing multiple large models can lead to resource constraints, rendering the training of a model for each domain virtually impractical. Therefore, this paper introduces a novel approach composed of one core model with multiple sets of domain-specific parameters. We utilize techniques such as prompt tuning and adapters, combined with the incorporation of additional layers, to add parameters that we can train for the specific domains. This enables the model to perform comparably to individual models for each domain. Experimental results on different formal and informal datasets show that by employing these added parameters, the proposed model significantly surpasses existing practical models in performance. Remarkably, the proposed model requires only one instance for training and storage, yet achieves outstanding results across all domains, even surpassing the state-of-the-art in some. Moreover, we analyze each adaptation strategy, delineating its strengths, weaknesses, and optimal hyper-parameters for the Persian NER settings. Finally, we introduce a document-based domain detection pipeline tailored for scenarios with unknown text domains, enhancing the adaptability and practicality of this paper in real-world applications.

Multi-BERT: Leveraging Adapters and Prompt Tuning for Low-Resource Multi-Domain Adaptation

TL;DR

This work tackles multi-domain Persian NER by proposing multi-Bert, a single frozen core model augmented with domain-specific adapters and task-specific headers, enabling per-domain outputs without training separate full models. It explores two adaptation mechanisms—prefix-tuning and LoRA—to inject domain knowledge while preserving the shared core, and introduces a document-based pipeline to handle unknown domains. Across formal, informal, and noisy multi-domain datasets, multi-Bert with prompt-tuning achieves state-of-the-art or near-state-of-the-art performance, with prompt-tuning outperforming LoRA in low-resource settings. The approach offers practical benefits in training efficiency and storage, and a deployable domain-detection step enhances applicability to real-world text streams.

Abstract

The rapid expansion of texts' volume and diversity presents formidable challenges in multi-domain settings. These challenges are also visible in the Persian name entity recognition (NER) settings. Traditional approaches, either employing a unified model for multiple domains or individual models for each domain, frequently pose significant limitations. Single models often struggle to capture the nuances of diverse domains, while utilizing multiple large models can lead to resource constraints, rendering the training of a model for each domain virtually impractical. Therefore, this paper introduces a novel approach composed of one core model with multiple sets of domain-specific parameters. We utilize techniques such as prompt tuning and adapters, combined with the incorporation of additional layers, to add parameters that we can train for the specific domains. This enables the model to perform comparably to individual models for each domain. Experimental results on different formal and informal datasets show that by employing these added parameters, the proposed model significantly surpasses existing practical models in performance. Remarkably, the proposed model requires only one instance for training and storage, yet achieves outstanding results across all domains, even surpassing the state-of-the-art in some. Moreover, we analyze each adaptation strategy, delineating its strengths, weaknesses, and optimal hyper-parameters for the Persian NER settings. Finally, we introduce a document-based domain detection pipeline tailored for scenarios with unknown text domains, enhancing the adaptability and practicality of this paper in real-world applications.
Paper Structure (11 sections, 5 figures, 2 tables)

This paper contains 11 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Each set of parameters belongs to exactly one domain but layers are often shared by a couple of domains.
  • Figure 2: The size of the data in each domain of text greatly differs from one another, which results in massive challenges.
  • Figure 3: Each line on the left image shows the scores for an specific alpha with different R values. The model on the right however, shows the scores received by the prefix model with different number of tokens added to the inputs of each layer.
  • Figure 4: The models specialized on News, politics and economy get the tag of US right. Text from the political domain of ParsNER. Text translation: The The Saudi prince proved to be loyal to the United States.
  • Figure 5: One forward pass determines the domain of a set which then can be used for each single input.