CBF-LLM: Safe Control for LLM Alignment

Yuya Miyaoka; Masaki Inoue

CBF-LLM: Safe Control for LLM Alignment

Yuya Miyaoka, Masaki Inoue

TL;DR

A control-based framework for aligning large language models (LLMs) by leveraging a control barrier function (CBF) to ensure user-desirable text generation to ensure user-desirable text generation is proposed.

Abstract

This paper proposes a control-based framework for aligning large language models (LLMs) by leveraging a control barrier function (CBF) to ensure user-desirable text generation. The presented framework applies the safety filter, designed based on the CBF, to the output generation of the baseline LLM, i.e., the sequence of the token, with the aim of intervening in the generated text. The overall text-generation system is implemented with Llama 3 and a RoBERTa model, and the source code is available at https://github.com/Mya-Mya/CBF-LLM. The experiment demonstrates its control ability and effectiveness in reducing the number of interventions needed for user-specified alignment tasks.

CBF-LLM: Safe Control for LLM Alignment

TL;DR

Abstract

Paper Structure (10 sections, 1 theorem, 11 equations, 6 figures, 2 tables, 3 algorithms)

This paper contains 10 sections, 1 theorem, 11 equations, 6 figures, 2 tables, 3 algorithms.

Introduction
Preliminary
Control Barrier Function for Safe Control
Text generation by Large Language Models
CBF-LLM
Experiment
Setting
Result
Conclusion
Nominal Text Generation with Top-k Sampling

Key Result

Theorem 1

The state $x$ of the system E:PL.NominalDynamics is in the safe set, i.e., $x\in\mathcal{S}$ for all time if $h$ is a control barrier function and the action $u$ satisfies the CBF constraint E:PL.CBFConstraint.

Figures (6)

Figure 1: Concept of CBF-LLM. Upper: Collision avoidance in a vehicle control system, Lower: Collision avoidance in text-generation by LLMs.
Figure 2: Nominal structure for text generation
Figure 3: Structure of presented text-generation system, named CBF-LLM
Figure 4: L-CF trajectory of each controller
Figure 5: Predicted L-CF trajectories
...and 1 more figures

Theorems & Definitions (6)

Theorem 1
Remark 1
Example 1
Example 2
Remark 2
Remark 3

CBF-LLM: Safe Control for LLM Alignment

TL;DR

Abstract

CBF-LLM: Safe Control for LLM Alignment

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (6)