Table of Contents
Fetching ...

Cognitive Modeling with Scaffolded LLMs: A Case Study of Referential Expression Generation

Polina Tsvilodub, Michael Franke, Fausto Carcassi

TL;DR

This work tackles modeling referential expression generation within a cognitively plausible framework by combining Dale & Reiter’s Incremental Algorithm with LLM components. The Iterative Model uses a hybrid neuro-symbolic pipeline (UtterancesProposer and SemanticEvaluator) to produce contrastive utterances and iteratively refine them, evaluated on the A3DS dataset. Results show the IM achieves higher contrastivity than ablated single-pass and one-shot baselines in multi-distractor contexts, with iterative depth adapting to task difficulty, while maintaining cognitive plausibility without fine-tuning. The study demonstrates that explanatory cognitive modeling can leverage LLMs in a modular, non-finetuned manner, enabling broader, open-ended language-generation research across domains.

Abstract

To what extent can LLMs be used as part of a cognitive model of language generation? In this paper, we approach this question by exploring a neuro-symbolic implementation of an algorithmic cognitive model of referential expression generation by Dale & Reiter (1995). The symbolic task analysis implements the generation as an iterative procedure that scaffolds symbolic and gpt-3.5-turbo-based modules. We compare this implementation to an ablated model and a one-shot LLM-only baseline on the A3DS dataset (Tsvilodub & Franke, 2023). We find that our hybrid approach is cognitively plausible and performs well in complex contexts, while allowing for more open-ended modeling of language generation in a larger domain.

Cognitive Modeling with Scaffolded LLMs: A Case Study of Referential Expression Generation

TL;DR

This work tackles modeling referential expression generation within a cognitively plausible framework by combining Dale & Reiter’s Incremental Algorithm with LLM components. The Iterative Model uses a hybrid neuro-symbolic pipeline (UtterancesProposer and SemanticEvaluator) to produce contrastive utterances and iteratively refine them, evaluated on the A3DS dataset. Results show the IM achieves higher contrastivity than ablated single-pass and one-shot baselines in multi-distractor contexts, with iterative depth adapting to task difficulty, while maintaining cognitive plausibility without fine-tuning. The study demonstrates that explanatory cognitive modeling can leverage LLMs in a modular, non-finetuned manner, enabling broader, open-ended language-generation research across domains.

Abstract

To what extent can LLMs be used as part of a cognitive model of language generation? In this paper, we approach this question by exploring a neuro-symbolic implementation of an algorithmic cognitive model of referential expression generation by Dale & Reiter (1995). The symbolic task analysis implements the generation as an iterative procedure that scaffolds symbolic and gpt-3.5-turbo-based modules. We compare this implementation to an ablated model and a one-shot LLM-only baseline on the A3DS dataset (Tsvilodub & Franke, 2023). We find that our hybrid approach is cognitively plausible and performs well in complex contexts, while allowing for more open-ended modeling of language generation in a larger domain.
Paper Structure (19 sections, 1 equation, 3 figures, 2 algorithms)

This paper contains 19 sections, 1 equation, 3 figures, 2 algorithms.

Figures (3)

  • Figure 1: The figure shows two iterations of the model, ending with the production of a contrastive utterance. T, D1, and D2 denote the target and the two distractor states respectively. $C$ indicates the contrastivity values. Only the target is passed to the utterance proposer. Boxes with a brown border indicate LLM-based modules, components with green symbolic modules. The labels of the modules indicate section numbers in the appendix containing the full details.
  • Figure 2: Distribution over contrastivity values (y-axis) by number of distractors (x-axis) and number of utterances proposed (color). Error bars show bootstrapped 95%-CIs.
  • Figure 3: Development of task success over increasing tree depth in the IM: distribution over contrastivity values (y-axis) over increasing tree depth (extended utterance proposal and evaluation iterations; x-axis), by number of distractors (facets) and tree width (number of proposed utterances; color). Dots indicate means, thick bars indicate quartiles, thinner lines indicate minimal values.