An Active Learning Framework for Inclusive Generation by Large Language Models
Sabit Hassan, Anthony Sicilia, Malihe Alikhani
TL;DR
The paper tackles bias in LLM-generated text by proposing a clustering-based active learning framework augmented with knowledge distillation to improve inclusivity for underrepresented groups. It introduces a regulated-attribute informed sampling mechanism that maps interim generator outputs to a 1D latent space via an auxiliary model, replacing traditional entropy with $E_i = \text{Softmax}(R(G(x_i), H))$ and selecting samples from clusters for distillation-guided refinement. The approach is validated on counter-narration and style-transfer through two new 1K-pair datasets, showing improved inclusivity and lexical diversity, as well as transferability to other models. The results indicate practical viability for active learning in generative tasks, offering a scalable path toward more socially responsible LLM generation with limited labeled data.
Abstract
Ensuring that Large Language Models (LLMs) generate text representative of diverse sub-populations is essential, particularly when key concepts related to under-represented groups are scarce in the training data. We address this challenge with a novel clustering-based active learning framework, enhanced with knowledge distillation. The proposed framework transforms the intermediate outputs of the learner model, enabling effective active learning for generative tasks for the first time. Integration of clustering and knowledge distillation yields more representative models without prior knowledge of underlying data distribution and overbearing human efforts. We validate our approach in practice through case studies in counter-narration and style transfer. We construct two new datasets in tandem with model training, showing a performance improvement of 2%-10% over baseline models. Our results also show more consistent performance across various data subgroups and increased lexical diversity, underscoring our model's resilience to skewness in available data. Further, our results show that the data acquired via our approach improves the performance of secondary models not involved in the learning loop, showcasing practical utility of the framework.
