Table of Contents
Fetching ...

Survival of the Safest: Towards Secure Prompt Optimization through Interleaved Multi-Objective Evolution

Ankita Sinha, Wendi Cui, Kamalika Das, Jiaxin Zhang

TL;DR

This work tackles the risk of optimizing LLM prompts for performance at the expense of safety. It introduces SoS, Survival of the Safest, a secure multi-objective prompt optimization framework that interleaves semantic mutation, feedback mutation, and crossover to efficiently explore high-dimensional prompt spaces without relying on Pareto-front methods. By weighting multiple objectives (e.g., KPI and safety) and using an MD-Judge safeguard, SoS produces a pool of high-performing, safety-conscious prompts demonstrated across diverse NLP tasks and safety benchmarks. The approach supports industrial deployment with configurable objective weighting, while acknowledging computational costs and data-bias limitations, and points to online optimization as a promising avenue for future work.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities; however, the optimization of their prompts has historically prioritized performance metrics at the expense of crucial safety and security considerations. To overcome this shortcoming, we introduce "Survival of the Safest" (SoS), an innovative multi-objective prompt optimization framework that enhances both performance and security in LLMs simultaneously. SoS utilizes an interleaved multi-objective evolution strategy, integrating semantic, feedback, and crossover mutations to effectively traverse the prompt landscape. Differing from the computationally demanding Pareto front methods, SoS provides a scalable solution that expedites optimization in complex, high-dimensional discrete search spaces while keeping computational demands low. Our approach accommodates flexible weighting of objectives and generates a pool of optimized candidates, empowering users to select prompts that optimally meet their specific performance and security needs. Experimental evaluations across diverse benchmark datasets affirm SoS's efficacy in delivering high performance and notably enhancing safety and security compared to single-objective methods. This advancement marks a significant stride towards the deployment of LLM systems that are both high-performing and secure across varied industrial applications

Survival of the Safest: Towards Secure Prompt Optimization through Interleaved Multi-Objective Evolution

TL;DR

This work tackles the risk of optimizing LLM prompts for performance at the expense of safety. It introduces SoS, Survival of the Safest, a secure multi-objective prompt optimization framework that interleaves semantic mutation, feedback mutation, and crossover to efficiently explore high-dimensional prompt spaces without relying on Pareto-front methods. By weighting multiple objectives (e.g., KPI and safety) and using an MD-Judge safeguard, SoS produces a pool of high-performing, safety-conscious prompts demonstrated across diverse NLP tasks and safety benchmarks. The approach supports industrial deployment with configurable objective weighting, while acknowledging computational costs and data-bias limitations, and points to online optimization as a promising avenue for future work.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities; however, the optimization of their prompts has historically prioritized performance metrics at the expense of crucial safety and security considerations. To overcome this shortcoming, we introduce "Survival of the Safest" (SoS), an innovative multi-objective prompt optimization framework that enhances both performance and security in LLMs simultaneously. SoS utilizes an interleaved multi-objective evolution strategy, integrating semantic, feedback, and crossover mutations to effectively traverse the prompt landscape. Differing from the computationally demanding Pareto front methods, SoS provides a scalable solution that expedites optimization in complex, high-dimensional discrete search spaces while keeping computational demands low. Our approach accommodates flexible weighting of objectives and generates a pool of optimized candidates, empowering users to select prompts that optimally meet their specific performance and security needs. Experimental evaluations across diverse benchmark datasets affirm SoS's efficacy in delivering high performance and notably enhancing safety and security compared to single-objective methods. This advancement marks a significant stride towards the deployment of LLM systems that are both high-performing and secure across varied industrial applications

Paper Structure

This paper contains 35 sections, 2 equations, 3 figures, 10 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of SoS: a novel framework for secure multi-objective prompt optimization.
  • Figure 2: Overall depiction of our prompt evolution process. Semantic mutation involves generating multiple variants of the initial seed prompt to kickstart evolution. Security and KPI mutation are the two feedback mutators that generate one mutated variant of every prompt, doubling the population. Then the selection process rejects all prompts that are not locally optimal and the rest proceed to the next stage. Crossover mutation is employed to further blend and balance different objectives before picking up the final pool of optimal candidates.
  • Figure 3: (left) Overview of evolution strategies. The dotted lines indicate that the enclosed block is run multiple times until convergence. (right) Candidate evolution from initialization, and feedback to crossover mutation through iteration on the Disambiguation QA task.

Theorems & Definitions (1)

  • Definition 1