Table of Contents
Fetching ...

Characteristic AI Agents via Large Language Models

Xi Wang, Hongliang Dai, Shen Gao, Piji Li

TL;DR

A benchmark for the characteristic AI agents task is created, including dataset, techniques, and evaluation metrics, and a set of automatic metrics for quantitative performance evaluation are devised.

Abstract

The advancement of Large Language Models (LLMs) has led to significant enhancements in the performance of chatbot systems. Many researchers have dedicated their efforts to the development of bringing characteristics to chatbots. While there have been commercial products for developing role-driven chatbots using LLMs, it is worth noting that academic research in this area remains relatively scarce. Our research focuses on investigating the performance of LLMs in constructing Characteristic AI Agents by simulating real-life individuals across different settings. Current investigations have primarily focused on act on roles with simple profiles. In response to this research gap, we create a benchmark for the characteristic AI agents task, including dataset, techniques, and evaluation metrics. A dataset called ``Character100'' is built for this benchmark, comprising the most-visited people on Wikipedia for language models to role-play. With the constructed dataset, we conduct comprehensive assessment of LLMs across various settings. In addition, we devise a set of automatic metrics for quantitative performance evaluation. The experimental results underscore the potential directions for further improvement in the capabilities of LLMs in constructing characteristic AI agents. The benchmark is available at https://github.com/nuaa-nlp/Character100.

Characteristic AI Agents via Large Language Models

TL;DR

A benchmark for the characteristic AI agents task is created, including dataset, techniques, and evaluation metrics, and a set of automatic metrics for quantitative performance evaluation are devised.

Abstract

The advancement of Large Language Models (LLMs) has led to significant enhancements in the performance of chatbot systems. Many researchers have dedicated their efforts to the development of bringing characteristics to chatbots. While there have been commercial products for developing role-driven chatbots using LLMs, it is worth noting that academic research in this area remains relatively scarce. Our research focuses on investigating the performance of LLMs in constructing Characteristic AI Agents by simulating real-life individuals across different settings. Current investigations have primarily focused on act on roles with simple profiles. In response to this research gap, we create a benchmark for the characteristic AI agents task, including dataset, techniques, and evaluation metrics. A dataset called ``Character100'' is built for this benchmark, comprising the most-visited people on Wikipedia for language models to role-play. With the constructed dataset, we conduct comprehensive assessment of LLMs across various settings. In addition, we devise a set of automatic metrics for quantitative performance evaluation. The experimental results underscore the potential directions for further improvement in the capabilities of LLMs in constructing characteristic AI agents. The benchmark is available at https://github.com/nuaa-nlp/Character100.
Paper Structure (27 sections, 2 figures, 3 tables)

This paper contains 27 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: An example of the characteristic AI agents task. Chatbots need to mimic the person and answer the query according to the information.
  • Figure 2: The output of open-source and close-source models in the few-shot setting. LLMs with "*" denotes that they have been fine-tuned by QLoRA techniques. We have omitted some unnecessary contents for saving space.