Table of Contents
Fetching ...

RISCORE: Enhancing In-Context Riddle Solving in Language Models through Context-Reconstructed Example Augmentation

Ioannis Panagiotopoulos, Giorgos Filandrianos, Maria Lymperaiou, Giorgos Stamou

TL;DR

The paper tackles in-context riddle solving by evaluating how prompting strategies affect reasoning in large language models and proposes RISCORE, a prompting method that augments few-shot exemplars with context-reconstructed riddles. It introduces an automated pipeline to generate contextually reconstructed Question–Answer pairs and distractors, producing exemplar sets that emphasize reasoning patterns over surface semantics. Across BrainTeaser (lateral thinking) and RiddleSense (vertical thinking), RISCORE-based prompts yield robust accuracy gains over traditional exemplar-selection baselines across multiple models and shot configurations, with manual reconstructions offering an upper bound. The findings suggest that context-aware, reasoning-focused exemplars can unlock deeper analytical capabilities in in-context reasoning, while also highlighting limitations such as reliance on initial similarity cues and dataset coverage, motivating future cross-dataset and multilingual extensions.

Abstract

Riddle-solving requires advanced reasoning skills, pushing LLMs to engage in abstract thinking and creative problem-solving, often revealing limitations in their cognitive abilities. In this paper, we examine the riddle-solving capabilities of LLMs using a multiple-choice format, exploring how different prompting techniques impact performance on riddles that demand diverse reasoning skills. To enhance results, we introduce RISCORE (RIddle Solving with COntext REcontruciton) a novel fully automated prompting method that generates and utilizes contextually reconstructed sentence-based puzzles in conjunction with the original examples to create few-shot exemplars. Our experiments demonstrate that RISCORE significantly improves the performance of language models in both vertical and lateral thinking tasks, surpassing traditional exemplar selection strategies across a variety of few-shot settings.

RISCORE: Enhancing In-Context Riddle Solving in Language Models through Context-Reconstructed Example Augmentation

TL;DR

The paper tackles in-context riddle solving by evaluating how prompting strategies affect reasoning in large language models and proposes RISCORE, a prompting method that augments few-shot exemplars with context-reconstructed riddles. It introduces an automated pipeline to generate contextually reconstructed Question–Answer pairs and distractors, producing exemplar sets that emphasize reasoning patterns over surface semantics. Across BrainTeaser (lateral thinking) and RiddleSense (vertical thinking), RISCORE-based prompts yield robust accuracy gains over traditional exemplar-selection baselines across multiple models and shot configurations, with manual reconstructions offering an upper bound. The findings suggest that context-aware, reasoning-focused exemplars can unlock deeper analytical capabilities in in-context reasoning, while also highlighting limitations such as reliance on initial similarity cues and dataset coverage, motivating future cross-dataset and multilingual extensions.

Abstract

Riddle-solving requires advanced reasoning skills, pushing LLMs to engage in abstract thinking and creative problem-solving, often revealing limitations in their cognitive abilities. In this paper, we examine the riddle-solving capabilities of LLMs using a multiple-choice format, exploring how different prompting techniques impact performance on riddles that demand diverse reasoning skills. To enhance results, we introduce RISCORE (RIddle Solving with COntext REcontruciton) a novel fully automated prompting method that generates and utilizes contextually reconstructed sentence-based puzzles in conjunction with the original examples to create few-shot exemplars. Our experiments demonstrate that RISCORE significantly improves the performance of language models in both vertical and lateral thinking tasks, surpassing traditional exemplar selection strategies across a variety of few-shot settings.
Paper Structure (54 sections, 3 figures, 10 tables)

This paper contains 54 sections, 3 figures, 10 tables.

Figures (3)

  • Figure 1: Standard FS prompting vs RISCORE prompting: by encouraging reasoning-based selection of exemplars (riddle in green) in place of semantic similarity selection (riddle in red), the model unlocks the reasoning pattern, guided towards the correct answer.
  • Figure 2: An overview of RISCORE, where the reconstructed instances, along with their original counterparts, are incorporated as exemplars in the few-shot setting to enhance the model's riddle solving ability.
  • Figure 3: An overview of the automated method for generating a context-reconstructed riddle.