Table of Contents
Fetching ...

Active Learning for Robust and Representative LLM Generation in Safety-Critical Scenarios

Sabit Hassan, Anthony Sicilia, Malihe Alikhani

TL;DR

This work proposes a novel framework that integrates active learning with clustering to guide LLM generation, enhancing their representativeness and robustness in safety scenarios, and shows that the proposed framework produces a more representative set of safety scenarios without requiring prior knowledge of the underlying data distribution.

Abstract

Ensuring robust safety measures across a wide range of scenarios is crucial for user-facing systems. While Large Language Models (LLMs) can generate valuable data for safety measures, they often exhibit distributional biases, focusing on common scenarios and neglecting rare but critical cases. This can undermine the effectiveness of safety protocols developed using such data. To address this, we propose a novel framework that integrates active learning with clustering to guide LLM generation, enhancing their representativeness and robustness in safety scenarios. We demonstrate the effectiveness of our approach by constructing a dataset of 5.4K potential safety violations through an iterative process involving LLM generation and an active learner model's feedback. Our results show that the proposed framework produces a more representative set of safety scenarios without requiring prior knowledge of the underlying data distribution. Additionally, data acquired through our method improves the accuracy and F1 score of both the active learner model as well models outside the scope of active learning process, highlighting its broad applicability.

Active Learning for Robust and Representative LLM Generation in Safety-Critical Scenarios

TL;DR

This work proposes a novel framework that integrates active learning with clustering to guide LLM generation, enhancing their representativeness and robustness in safety scenarios, and shows that the proposed framework produces a more representative set of safety scenarios without requiring prior knowledge of the underlying data distribution.

Abstract

Ensuring robust safety measures across a wide range of scenarios is crucial for user-facing systems. While Large Language Models (LLMs) can generate valuable data for safety measures, they often exhibit distributional biases, focusing on common scenarios and neglecting rare but critical cases. This can undermine the effectiveness of safety protocols developed using such data. To address this, we propose a novel framework that integrates active learning with clustering to guide LLM generation, enhancing their representativeness and robustness in safety scenarios. We demonstrate the effectiveness of our approach by constructing a dataset of 5.4K potential safety violations through an iterative process involving LLM generation and an active learner model's feedback. Our results show that the proposed framework produces a more representative set of safety scenarios without requiring prior knowledge of the underlying data distribution. Additionally, data acquired through our method improves the accuracy and F1 score of both the active learner model as well models outside the scope of active learning process, highlighting its broad applicability.

Paper Structure

This paper contains 36 sections, 1 equation, 3 figures, 8 tables, 1 algorithm.

Figures (3)

  • Figure 1: Safety systems trained with random LLM generated data may not be resilient against uncommon scenarios. Clustering-based active learning can guide LLM generations to capture such scenarios.
  • Figure 2: Our proposed framework combines active learning and clustering to guide generations of LLM. Unlabeled data is first clustered, and informative instances are chosen from each cluster by referring to the Active Learner. These instances are then passed to LLM for generation. The active learner is updated at end of each iteration.
  • Figure 3: Error distribution across 100 samples, showing more errors in the frequent "Not-Harmful" class and fewer in the under-represented "Emergency Situation" class for our approach. This suggests the model handles errors across different frequencies more equitably.