Table of Contents
Fetching ...

Self-Regulated Data-Free Knowledge Amalgamation for Text Classification

Prashanth Vijayaraghavan, Hongzhi Wang, Luyao Shi, Tyler Baldwin, David Beymer, Ehsan Degan

TL;DR

This work tackles data-free knowledge amalgamation for NLP by training a compact student from multiple pre-trained teachers without access to their training data. It introduces StrataNet, which combines a steerable data generator for each teacher with a confidence-guided, block-wise amalgamation module implemented via a Selective Transformer, enabling cross-teacher knowledge transfer across non-overlapping label spaces. Empirically, StrataNet achieves superior accuracy over both data-driven and data-free baselines on AG News, 5 Abstracts Group, and OhSumed, with ablations showing the critical roles of the Relative Mahalanobis distance-based OOD scoring and the ST-amalg mechanism. The approach promises practical benefit for deploying multi-teacher knowledge in privacy- or IP-constrained settings, facilitating robust text classification without access to original data.

Abstract

Recently, there has been a growing availability of pre-trained text models on various model repositories. These models greatly reduce the cost of training new models from scratch as they can be fine-tuned for specific tasks or trained on large datasets. However, these datasets may not be publicly accessible due to the privacy, security, or intellectual property issues. In this paper, we aim to develop a lightweight student network that can learn from multiple teacher models without accessing their original training data. Hence, we investigate Data-Free Knowledge Amalgamation (DFKA), a knowledge-transfer task that combines insights from multiple pre-trained teacher models and transfers them effectively to a compact student network. To accomplish this, we propose STRATANET, a modeling framework comprising: (a) a steerable data generator that produces text data tailored to each teacher and (b) an amalgamation module that implements a self-regulative strategy using confidence estimates from the teachers' different layers to selectively integrate their knowledge and train a versatile student. We evaluate our method on three benchmark text classification datasets with varying labels or domains. Empirically, we demonstrate that the student model learned using our STRATANET outperforms several baselines significantly under data-driven and data-free constraints.

Self-Regulated Data-Free Knowledge Amalgamation for Text Classification

TL;DR

This work tackles data-free knowledge amalgamation for NLP by training a compact student from multiple pre-trained teachers without access to their training data. It introduces StrataNet, which combines a steerable data generator for each teacher with a confidence-guided, block-wise amalgamation module implemented via a Selective Transformer, enabling cross-teacher knowledge transfer across non-overlapping label spaces. Empirically, StrataNet achieves superior accuracy over both data-driven and data-free baselines on AG News, 5 Abstracts Group, and OhSumed, with ablations showing the critical roles of the Relative Mahalanobis distance-based OOD scoring and the ST-amalg mechanism. The approach promises practical benefit for deploying multi-teacher knowledge in privacy- or IP-constrained settings, facilitating robust text classification without access to original data.

Abstract

Recently, there has been a growing availability of pre-trained text models on various model repositories. These models greatly reduce the cost of training new models from scratch as they can be fine-tuned for specific tasks or trained on large datasets. However, these datasets may not be publicly accessible due to the privacy, security, or intellectual property issues. In this paper, we aim to develop a lightweight student network that can learn from multiple teacher models without accessing their original training data. Hence, we investigate Data-Free Knowledge Amalgamation (DFKA), a knowledge-transfer task that combines insights from multiple pre-trained teacher models and transfers them effectively to a compact student network. To accomplish this, we propose STRATANET, a modeling framework comprising: (a) a steerable data generator that produces text data tailored to each teacher and (b) an amalgamation module that implements a self-regulative strategy using confidence estimates from the teachers' different layers to selectively integrate their knowledge and train a versatile student. We evaluate our method on three benchmark text classification datasets with varying labels or domains. Empirically, we demonstrate that the student model learned using our STRATANET outperforms several baselines significantly under data-driven and data-free constraints.
Paper Structure (28 sections, 5 equations, 5 figures, 6 tables)

This paper contains 28 sections, 5 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Given a set of pre-trained teacher models (Teacher Models 1 & 2), each with distinct expertise, the goal is to train a student model capable of amalgamating their knowledge, mastering prediction across all specialized classes of the teachers.
  • Figure 2: Illustration of our StrataNet framework.
  • Figure 3: (A) Impact of different OOD scores -- Rmd, Md & MSP, (B) Impact of ST-amalg, (C) Effect of Multiple Heterogeneous teachers on OhSumed dataset.
  • Figure 4: Effect of modifying $\lambda$.
  • Figure 5: Effect of Steerable Data Generation. Llama-2 with manually designed prompts doesn't outperform our generation module.