Table of Contents
Fetching ...

Culinary Class Wars: Evaluating LLMs using ASH in Cuisine Transfer Task

Hoonick Lee, Mogan Gim, Donghyeon Park, Donghee Choi, Jaewoo Kang

Abstract

The advent of Large Language Models (LLMs) have shown promise in various creative domains, including culinary arts. However, many LLMs still struggle to deliver the desired level of culinary creativity, especially when tasked with adapting recipes to meet specific cultural requirements. This study focuses on cuisine transfer-applying elements of one cuisine to another-to assess LLMs' culinary creativity. We employ a diverse set of LLMs to generate and evaluate culturally adapted recipes, comparing their evaluations against LLM and human judgments. We introduce the ASH (authenticity, sensitivity, harmony) benchmark to evaluate LLMs' recipe generation abilities in the cuisine transfer task, assessing their cultural accuracy and creativity in the culinary domain. Our findings reveal crucial insights into both generative and evaluative capabilities of LLMs in the culinary domain, highlighting strengths and limitations in understanding and applying cultural nuances in recipe creation. The code and dataset used in this project will be openly available in \url{http://github.com/dmis-lab/CulinaryASH}.

Culinary Class Wars: Evaluating LLMs using ASH in Cuisine Transfer Task

Abstract

The advent of Large Language Models (LLMs) have shown promise in various creative domains, including culinary arts. However, many LLMs still struggle to deliver the desired level of culinary creativity, especially when tasked with adapting recipes to meet specific cultural requirements. This study focuses on cuisine transfer-applying elements of one cuisine to another-to assess LLMs' culinary creativity. We employ a diverse set of LLMs to generate and evaluate culturally adapted recipes, comparing their evaluations against LLM and human judgments. We introduce the ASH (authenticity, sensitivity, harmony) benchmark to evaluate LLMs' recipe generation abilities in the cuisine transfer task, assessing their cultural accuracy and creativity in the culinary domain. Our findings reveal crucial insights into both generative and evaluative capabilities of LLMs in the culinary domain, highlighting strengths and limitations in understanding and applying cultural nuances in recipe creation. The code and dataset used in this project will be openly available in \url{http://github.com/dmis-lab/CulinaryASH}.

Paper Structure

This paper contains 19 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Mean and standard deviance of authenticity, sensitivity, harmony ratings calculated for each generator(y-axis)-evaluator(x-axis) pair across all cuisine transfers. The color scale ranges from 1 (blue) to 5 (red).
  • Figure 2: Sensitivity ratings and top 3 frequently used words for each cuisine transfer.
  • Figure 3: Average absolute difference values across evaluators.
  • Figure 4: Average evaluation scores of authenticity, sensitivity, and harmony ratings were calculated for each generator(y-axis)-evaluator(x-axis) pair across all cuisine transfers. The color scale ranges from 0 (blue) to 1 (red).
  • Figure 5: Evaluation prompt given to models for each generated recipe. The prompt includes a scoring system (1-5 scale) and detailed explanations of the three evaluation criteria: authenticity (Preservation of the original dish's characteristics), sensitivity (Accurate incorporation of the target cuisine), and harmony (Overall balance between authenticity and sensitivity . To ensure a structured evaluation result, a specific response format was also given. We obtained a total number of $129,600$ evaluation results.