Table of Contents
Fetching ...

Investigating Cost-Efficiency of LLM-Generated Training Data for Conversational Semantic Frame Analysis

Shiho Matta, Yin Jou Huang, Fei Cheng, Hirokazu Kiyomaru, Yugo Murawaki

TL;DR

Experiments reveal that optimal cost-efficiency is achieved by combining both human and LLM-generated data across a wide range of budget levels, and as the budget decreases, as the budget decreases, a higher proportion of LLM-generated data becomes more preferable.

Abstract

Recent studies have demonstrated that few-shot learning allows LLMs to generate training data for supervised models at a low cost. However, the quality of LLM-generated data may not entirely match that of human-labeled data. This raises a crucial question: how should one balance the trade-off between the higher quality but more expensive human data and the lower quality yet substantially cheaper LLM-generated data? In this paper, we synthesized training data for conversational semantic frame analysis using GPT-4 and examined how to allocate budgets optimally to achieve the best performance. Our experiments, conducted across various budget levels, reveal that optimal cost-efficiency is achieved by combining both human and LLM-generated data across a wide range of budget levels. Notably, as the budget decreases, a higher proportion of LLM-generated data becomes more preferable.

Investigating Cost-Efficiency of LLM-Generated Training Data for Conversational Semantic Frame Analysis

TL;DR

Experiments reveal that optimal cost-efficiency is achieved by combining both human and LLM-generated data across a wide range of budget levels, and as the budget decreases, as the budget decreases, a higher proportion of LLM-generated data becomes more preferable.

Abstract

Recent studies have demonstrated that few-shot learning allows LLMs to generate training data for supervised models at a low cost. However, the quality of LLM-generated data may not entirely match that of human-labeled data. This raises a crucial question: how should one balance the trade-off between the higher quality but more expensive human data and the lower quality yet substantially cheaper LLM-generated data? In this paper, we synthesized training data for conversational semantic frame analysis using GPT-4 and examined how to allocate budgets optimally to achieve the best performance. Our experiments, conducted across various budget levels, reveal that optimal cost-efficiency is achieved by combining both human and LLM-generated data across a wide range of budget levels. Notably, as the budget decreases, a higher proportion of LLM-generated data becomes more preferable.

Paper Structure

This paper contains 24 sections, 13 figures, 1 table.

Figures (13)

  • Figure 1: A dialogue piece with semantic frame annotation. Green indicates a trigger, and orange indicates an argument. The argument-trigger relation is illustrated with arrows. This is a simplified demonstration translated from Japanese.
  • Figure 2: The overview of our proposal to create two types of LLM-generated data: Human-Pseudo and Pseudo-Pseudo, and to investigate the cost-efficiency of combining them with human-labeled data under different budgets. The dialogue example is translated from Japanese.
  • Figure 3: GPT-4 is used as a pseudo-dialogue generator by taking preserved and previously generated dialogue sessions as few-shots. The orange-blue rainbow color indicates that the few-shots contain both human and pseudo-dialogues. Refer to the actual prompt design in Appendix \ref{['sec:pd_prompt_appendix']}.
  • Figure 4: We designed a novel multi-step labeling scheme for LLMs to handle SFA in text generation. Refer to the full prompt design in Appendix \ref{['sec:sfa_prompt_appendix']}.
  • Figure 5: The budget-wise cost-efficiency plot for combining Human-Human and Human-Pseudo data. The black dotted line represents the performance of few-shot GPT-4. Each budget curve features a star marking its optimal point. The shaded region around each curve indicates the standard deviation across five different seeds.
  • ...and 8 more figures