RISCORE: Enhancing In-Context Riddle Solving in Language Models through Context-Reconstructed Example Augmentation
Ioannis Panagiotopoulos, Giorgos Filandrianos, Maria Lymperaiou, Giorgos Stamou
TL;DR
The paper tackles in-context riddle solving by evaluating how prompting strategies affect reasoning in large language models and proposes RISCORE, a prompting method that augments few-shot exemplars with context-reconstructed riddles. It introduces an automated pipeline to generate contextually reconstructed Question–Answer pairs and distractors, producing exemplar sets that emphasize reasoning patterns over surface semantics. Across BrainTeaser (lateral thinking) and RiddleSense (vertical thinking), RISCORE-based prompts yield robust accuracy gains over traditional exemplar-selection baselines across multiple models and shot configurations, with manual reconstructions offering an upper bound. The findings suggest that context-aware, reasoning-focused exemplars can unlock deeper analytical capabilities in in-context reasoning, while also highlighting limitations such as reliance on initial similarity cues and dataset coverage, motivating future cross-dataset and multilingual extensions.
Abstract
Riddle-solving requires advanced reasoning skills, pushing LLMs to engage in abstract thinking and creative problem-solving, often revealing limitations in their cognitive abilities. In this paper, we examine the riddle-solving capabilities of LLMs using a multiple-choice format, exploring how different prompting techniques impact performance on riddles that demand diverse reasoning skills. To enhance results, we introduce RISCORE (RIddle Solving with COntext REcontruciton) a novel fully automated prompting method that generates and utilizes contextually reconstructed sentence-based puzzles in conjunction with the original examples to create few-shot exemplars. Our experiments demonstrate that RISCORE significantly improves the performance of language models in both vertical and lateral thinking tasks, surpassing traditional exemplar selection strategies across a variety of few-shot settings.
