Table of Contents
Fetching ...

An Active Learning Framework for Inclusive Generation by Large Language Models

Sabit Hassan, Anthony Sicilia, Malihe Alikhani

TL;DR

The paper tackles bias in LLM-generated text by proposing a clustering-based active learning framework augmented with knowledge distillation to improve inclusivity for underrepresented groups. It introduces a regulated-attribute informed sampling mechanism that maps interim generator outputs to a 1D latent space via an auxiliary model, replacing traditional entropy with $E_i = \text{Softmax}(R(G(x_i), H))$ and selecting samples from clusters for distillation-guided refinement. The approach is validated on counter-narration and style-transfer through two new 1K-pair datasets, showing improved inclusivity and lexical diversity, as well as transferability to other models. The results indicate practical viability for active learning in generative tasks, offering a scalable path toward more socially responsible LLM generation with limited labeled data.

Abstract

Ensuring that Large Language Models (LLMs) generate text representative of diverse sub-populations is essential, particularly when key concepts related to under-represented groups are scarce in the training data. We address this challenge with a novel clustering-based active learning framework, enhanced with knowledge distillation. The proposed framework transforms the intermediate outputs of the learner model, enabling effective active learning for generative tasks for the first time. Integration of clustering and knowledge distillation yields more representative models without prior knowledge of underlying data distribution and overbearing human efforts. We validate our approach in practice through case studies in counter-narration and style transfer. We construct two new datasets in tandem with model training, showing a performance improvement of 2%-10% over baseline models. Our results also show more consistent performance across various data subgroups and increased lexical diversity, underscoring our model's resilience to skewness in available data. Further, our results show that the data acquired via our approach improves the performance of secondary models not involved in the learning loop, showcasing practical utility of the framework.

An Active Learning Framework for Inclusive Generation by Large Language Models

TL;DR

The paper tackles bias in LLM-generated text by proposing a clustering-based active learning framework augmented with knowledge distillation to improve inclusivity for underrepresented groups. It introduces a regulated-attribute informed sampling mechanism that maps interim generator outputs to a 1D latent space via an auxiliary model, replacing traditional entropy with and selecting samples from clusters for distillation-guided refinement. The approach is validated on counter-narration and style-transfer through two new 1K-pair datasets, showing improved inclusivity and lexical diversity, as well as transferability to other models. The results indicate practical viability for active learning in generative tasks, offering a scalable path toward more socially responsible LLM generation with limited labeled data.

Abstract

Ensuring that Large Language Models (LLMs) generate text representative of diverse sub-populations is essential, particularly when key concepts related to under-represented groups are scarce in the training data. We address this challenge with a novel clustering-based active learning framework, enhanced with knowledge distillation. The proposed framework transforms the intermediate outputs of the learner model, enabling effective active learning for generative tasks for the first time. Integration of clustering and knowledge distillation yields more representative models without prior knowledge of underlying data distribution and overbearing human efforts. We validate our approach in practice through case studies in counter-narration and style transfer. We construct two new datasets in tandem with model training, showing a performance improvement of 2%-10% over baseline models. Our results also show more consistent performance across various data subgroups and increased lexical diversity, underscoring our model's resilience to skewness in available data. Further, our results show that the data acquired via our approach improves the performance of secondary models not involved in the learning loop, showcasing practical utility of the framework.

Paper Structure

This paper contains 37 sections, 3 equations, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: The training loop of our framework uses an auxiliary model to transform the interim output of a learner LLM, and selects informative instances from clustered unlabeled data. A distillation model then generates outputs, verified by humans, to iteratively refine the learner LLM.
  • Figure 2: Error ratio of resulting models, along with original data distribution (dashed line). Our clustering-based active learning approach is robust against data distribution skewness.
  • Figure 3: Offensive counts after style transfer. Random sampling leads to higher error rates overall, especially for target groups like persons of color. Cluster-AL achieves the lowest offensiveness overall and for most individual groups.