Table of Contents
Fetching ...

Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning

Tianhui Zhang, Bei Peng, Danushka Bollegala

TL;DR

A simple method is proposed that diversifies the LLM generations, while preserving their quality, and can be used as training data to improve diversity in existing commonsense generators.

Abstract

Generative Commonsense Reasoning (GCR) requires a model to reason about a situation using commonsense knowledge, while generating coherent sentences. Although the quality of the generated sentences is crucial, the diversity of the generation is equally important because it reflects the model's ability to use a range of commonsense knowledge facts. Large Language Models (LLMs) have shown proficiency in enhancing the generation quality across various tasks through in-context learning (ICL) using given examples without the need for any fine-tuning. However, the diversity aspect in LLM outputs has not been systematically studied before. To address this, we propose a simple method that diversifies the LLM generations, while preserving their quality. Experimental results on three benchmark GCR datasets show that our method achieves an ideal balance between the quality and diversity. Moreover, the sentences generated by our proposed method can be used as training data to improve diversity in existing commonsense generators.

Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning

TL;DR

A simple method is proposed that diversifies the LLM generations, while preserving their quality, and can be used as training data to improve diversity in existing commonsense generators.

Abstract

Generative Commonsense Reasoning (GCR) requires a model to reason about a situation using commonsense knowledge, while generating coherent sentences. Although the quality of the generated sentences is crucial, the diversity of the generation is equally important because it reflects the model's ability to use a range of commonsense knowledge facts. Large Language Models (LLMs) have shown proficiency in enhancing the generation quality across various tasks through in-context learning (ICL) using given examples without the need for any fine-tuning. However, the diversity aspect in LLM outputs has not been systematically studied before. To address this, we propose a simple method that diversifies the LLM generations, while preserving their quality. Experimental results on three benchmark GCR datasets show that our method achieves an ideal balance between the quality and diversity. Moreover, the sentences generated by our proposed method can be used as training data to improve diversity in existing commonsense generators.
Paper Structure (31 sections, 2 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 31 sections, 2 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: An example of diverse generated sentence sets in CommonGen CommonGen dataset. The generation shown at the bottom (in green) is considered by human annotators to be more diverse than those at the top (in red).
  • Figure 2: An example of default and diversified prompts is shown for an instance selected from the CommonGen dataset. Here, the default prompt shown in \ref{['fig:commongen_prompt']}a is taken from li:2023:deliberate. Few-shot examples are included in each prompt where [SRC] denotes the set of input concepts and [TGT] the corresponding sentences in CommonGen. For a given set of [INPUT] concepts, the LLM is then required to generate sentences at the slot [OUTPUT]. As shown in \ref{['fig:commongen_prompt']}b, Prop uses the diversified prompt, which operates in two steps. Step 1 generates a set of [N] sentences, [PRV]. We check for the diversity among the sentences in [PRV], and if it is low, we use the prompt in Step 2 to generate the final set of sentences.
  • Figure 3: Human vs. GPT3.5 diversity ratings for randomly sampled sets of sentences generated by Prop. Cohen's $\kappa = 0.409$ indicates a moderate level of agreement.
  • Figure 4: Sentences generated by default prompt and Prop against those by humans on CommonGen and ComVE test instances. Prop generates more diverse and high quality sentences than default.
  • Figure 5: The templates used by the default and the diversified prompt instructions for the CommonGen/DimonGen (shown on the left, (a)) and ComVE (shown on the right, (b)) tasks. Few-shot examples are included in each prompt where [SRC] denotes the set of input concepts and [TGT] the corresponding sentences in CommonGen. For a given set of [INPUT] concepts, the LLM is then required to generate sentences at the slot [OUTPUT].
  • ...and 1 more figures