Table of Contents
Fetching ...

Improving Structural Diversity of Blackbox LLMs via Chain-of-Specification Prompting

Halley Young, Yimeng Zeng, Jacob Gardner, Osbert Bastani

TL;DR

The paper tackles the challenge of controllably increasing text diversity in LLM outputs by defining structural diversity via a user-specified feature map $\phi: \mathcal{X} \to \{0,1\}^d$ and measuring diversity as the entropy of $\phi(x)$ over generated samples. It proposes chain-of-specification prompting (CoS), a two-step (and optionally multi-level) prompting strategy that first generates a specification $s$ and then generates text $x$ satisfying $\phi(x)=s$, enabling effective black-box operation. Empirically, CoS improves structural diversity across poetry and code domains more than baselines like random sampling, PDC, and nucleus sampling, with some exceptions for non-instruction-tuned models, and reveals diversity not captured by traditional metrics such as $n$-gram or BERT-based measures. The framework provides a flexible, domain-adaptable method for boosting qualitative diversity in black-box LLMs with potential extensions to white-box fine-tuning; its core insight is that diversifying the specification space can yield richer, domain-relevant variations in generated text. $\phi$ and the associated feature-space entropy thus become central tools for tailoring LLM outputs to user-defined structural criteria.

Abstract

The capability to generate diverse text is a key challenge facing large language models (LLMs). Thus far, diversity has been studied via metrics such as $n$-gram diversity or diversity of BERT embeddings. However, for these kinds of diversity, the user has little control over the dimensions along which diversity is considered. For example, in the poetry domain, one might desire diversity in terms of rhyme and meter, whereas in the code domain, one might desire diversity in terms of the kinds of expressions used to solve a problem. We propose a diversity metric called structural diversity, where the user provides a mapping from generated text to features capturing the kinds of diversity that they care about. In addition, we propose a novel strategy called chain-of-specification (CoS) prompting for improving diversity by first having the LLM generate a specification encoding one instance of structural features, and then prompting the LLM to generate text that satisfies these features; notably, our strategy works with blackbox LLMs. In our experiments, we show that for structural diversity in the poetry and code domains, CoS significantly improves diversity compared to several baselines.

Improving Structural Diversity of Blackbox LLMs via Chain-of-Specification Prompting

TL;DR

The paper tackles the challenge of controllably increasing text diversity in LLM outputs by defining structural diversity via a user-specified feature map and measuring diversity as the entropy of over generated samples. It proposes chain-of-specification prompting (CoS), a two-step (and optionally multi-level) prompting strategy that first generates a specification and then generates text satisfying , enabling effective black-box operation. Empirically, CoS improves structural diversity across poetry and code domains more than baselines like random sampling, PDC, and nucleus sampling, with some exceptions for non-instruction-tuned models, and reveals diversity not captured by traditional metrics such as -gram or BERT-based measures. The framework provides a flexible, domain-adaptable method for boosting qualitative diversity in black-box LLMs with potential extensions to white-box fine-tuning; its core insight is that diversifying the specification space can yield richer, domain-relevant variations in generated text. and the associated feature-space entropy thus become central tools for tailoring LLM outputs to user-defined structural criteria.

Abstract

The capability to generate diverse text is a key challenge facing large language models (LLMs). Thus far, diversity has been studied via metrics such as -gram diversity or diversity of BERT embeddings. However, for these kinds of diversity, the user has little control over the dimensions along which diversity is considered. For example, in the poetry domain, one might desire diversity in terms of rhyme and meter, whereas in the code domain, one might desire diversity in terms of the kinds of expressions used to solve a problem. We propose a diversity metric called structural diversity, where the user provides a mapping from generated text to features capturing the kinds of diversity that they care about. In addition, we propose a novel strategy called chain-of-specification (CoS) prompting for improving diversity by first having the LLM generate a specification encoding one instance of structural features, and then prompting the LLM to generate text that satisfies these features; notably, our strategy works with blackbox LLMs. In our experiments, we show that for structural diversity in the poetry and code domains, CoS significantly improves diversity compared to several baselines.
Paper Structure (12 sections, 9 equations, 13 figures)

This paper contains 12 sections, 9 equations, 13 figures.

Figures (13)

  • Figure 1: Chain-of-Specification prompting
  • Figure 2: Results of Diversity Metrics for Poetry, Code, and Coding Problem Domains, respectively. Higher = Better.
  • Figure 3: Results of coverage diversity for Poetry, Code, and Coding Problem Domains, respectively. Higher = Better.
  • Figure 4: Results of weighted surprisal diversity for Poetry, Code, and Coding Problem Domains, respectively. Higher = Better.
  • Figure 5: Results of boosted Jaccard diversity for Poetry, Code, and Coding Problem Domains, respectively. Higher = Better.
  • ...and 8 more figures