Table of Contents
Fetching ...

CBF-LLM: Safe Control for LLM Alignment

Yuya Miyaoka, Masaki Inoue

TL;DR

A control-based framework for aligning large language models (LLMs) by leveraging a control barrier function (CBF) to ensure user-desirable text generation to ensure user-desirable text generation is proposed.

Abstract

This paper proposes a control-based framework for aligning large language models (LLMs) by leveraging a control barrier function (CBF) to ensure user-desirable text generation. The presented framework applies the safety filter, designed based on the CBF, to the output generation of the baseline LLM, i.e., the sequence of the token, with the aim of intervening in the generated text. The overall text-generation system is implemented with Llama 3 and a RoBERTa model, and the source code is available at https://github.com/Mya-Mya/CBF-LLM. The experiment demonstrates its control ability and effectiveness in reducing the number of interventions needed for user-specified alignment tasks.

CBF-LLM: Safe Control for LLM Alignment

TL;DR

A control-based framework for aligning large language models (LLMs) by leveraging a control barrier function (CBF) to ensure user-desirable text generation to ensure user-desirable text generation is proposed.

Abstract

This paper proposes a control-based framework for aligning large language models (LLMs) by leveraging a control barrier function (CBF) to ensure user-desirable text generation. The presented framework applies the safety filter, designed based on the CBF, to the output generation of the baseline LLM, i.e., the sequence of the token, with the aim of intervening in the generated text. The overall text-generation system is implemented with Llama 3 and a RoBERTa model, and the source code is available at https://github.com/Mya-Mya/CBF-LLM. The experiment demonstrates its control ability and effectiveness in reducing the number of interventions needed for user-specified alignment tasks.
Paper Structure (10 sections, 1 theorem, 11 equations, 6 figures, 2 tables, 3 algorithms)

This paper contains 10 sections, 1 theorem, 11 equations, 6 figures, 2 tables, 3 algorithms.

Key Result

Theorem 1

The state $x$ of the system E:PL.NominalDynamics is in the safe set, i.e., $x\in\mathcal{S}$ for all time if $h$ is a control barrier function and the action $u$ satisfies the CBF constraint E:PL.CBFConstraint.

Figures (6)

  • Figure 1: Concept of CBF-LLM. Upper: Collision avoidance in a vehicle control system, Lower: Collision avoidance in text-generation by LLMs.
  • Figure 2: Nominal structure for text generation
  • Figure 3: Structure of presented text-generation system, named CBF-LLM
  • Figure 4: L-CF trajectory of each controller
  • Figure 5: Predicted L-CF trajectories
  • ...and 1 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Remark 1
  • Example 1
  • Example 2
  • Remark 2
  • Remark 3