Table of Contents
Fetching ...

Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models

Do Xuan Long, Duong Ngoc Yen, Anh Tuan Luu, Kenji Kawaguchi, Min-Yen Kan, Nancy F. Chen

TL;DR

It is demonstrated that Multi-expert Prompting significantly outperforms ExpertPrompting and comparable baselines in enhancing the truthfulness, factuality, informativeness, and usefulness of responses while reducing toxicity and hurtfulness.

Abstract

We present Multi-expert Prompting, a novel enhancement of ExpertPrompting (Xu et al., 2023), designed to improve the large language model (LLM) generation. Specifically, it guides an LLM to fulfill an input instruction by simulating multiple experts, aggregating their responses, and selecting the best among individual and aggregated responses. This process is performed in a single chain of thoughts through our seven carefully designed subtasks derived from the Nominal Group Technique (Ven and Delbecq, 1974), a well-established decision-making framework. Our evaluations demonstrate that Multi-expert Prompting significantly outperforms ExpertPrompting and comparable baselines in enhancing the truthfulness, factuality, informativeness, and usefulness of responses while reducing toxicity and hurtfulness. It further achieves state-of-the-art truthfulness by outperforming the best baseline by 8.69% with ChatGPT. Multi-expert Prompting is efficient, explainable, and highly adaptable to diverse scenarios, eliminating the need for manual prompt construction.

Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models

TL;DR

It is demonstrated that Multi-expert Prompting significantly outperforms ExpertPrompting and comparable baselines in enhancing the truthfulness, factuality, informativeness, and usefulness of responses while reducing toxicity and hurtfulness.

Abstract

We present Multi-expert Prompting, a novel enhancement of ExpertPrompting (Xu et al., 2023), designed to improve the large language model (LLM) generation. Specifically, it guides an LLM to fulfill an input instruction by simulating multiple experts, aggregating their responses, and selecting the best among individual and aggregated responses. This process is performed in a single chain of thoughts through our seven carefully designed subtasks derived from the Nominal Group Technique (Ven and Delbecq, 1974), a well-established decision-making framework. Our evaluations demonstrate that Multi-expert Prompting significantly outperforms ExpertPrompting and comparable baselines in enhancing the truthfulness, factuality, informativeness, and usefulness of responses while reducing toxicity and hurtfulness. It further achieves state-of-the-art truthfulness by outperforming the best baseline by 8.69% with ChatGPT. Multi-expert Prompting is efficient, explainable, and highly adaptable to diverse scenarios, eliminating the need for manual prompt construction.

Paper Structure

This paper contains 79 sections, 2 equations, 22 figures, 13 tables.

Figures (22)

  • Figure 1: An overview of Multi-expert Prompting with an ExpertQA malaviya23expertqa example. ExpertPrompting xu2023expertprompting provides a one-sided view, concluding "unethical" while Multi-expert Prompting encompasses multiple viewpoints leading to a comprehensively multifaceted answer.
  • Figure 2: Overview of Multi-expert Prompting: (1) Experts & responses generation (\ref{['subsec:expert-identities']}) and (2) Aggregating expert responses (\ref{['subsec:aggregating-expert-answers']}). Given an input instruction, the first step targets generating expert identities that best fulfill the instruction and expert responses, while the second step focuses on aggregating and selecting the best from individual and combined expert responses.
  • Figure 3: (C5) Informativeness and (C6) Usefulness comparisons between Multi-expert Prompting and baselines on ExpertQA dataset malaviya23expertqa.
  • Figure 4: A TruthfulQA lin-etal-2022-truthfulqa example where Multi-expert Prompting provides the correct answer, while the majority of experts answer incorrectly according to the ground-truth. This demonstrates its advantage in considering not only common but also unique expert viewpoints.
  • Figure 5: Comparison between Multi-expert Prompting, the baseline, and the baseline with constraints.
  • ...and 17 more figures