iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers
Harshit Gupta, Manav Chaudhary, Tathagata Raha, Shivansh Subramanian, Vasudeva Varma
TL;DR
This paper tackles SemEval-2024 Task 9 BRAINTEASER, which evaluates lateral-thinking in two MCQ subtasks: Sentence Puzzle and Word Puzzle. It proposes a hybrid prompting framework that combines zero-shot and few-shot prompts, contextualized example selection via cosine similarity, and a self-generated reasoning mechanism, evaluated on the Gemini Pro 1 model and GPT-4. The approach yields significant gains over baseline models but does not reach human annotators, underscoring both progress and remaining gaps in computational lateral thinking. The findings highlight practical strategies for improving reasoning in large language models on unconventional questions and suggest directions for future enhancement.
Abstract
This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense. The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models' lateral thinking capabilities. It consists of Sentence Puzzle and Word Puzzle subtasks that require models to defy default common-sense associations and exhibit unconventional thinking. We propose a unique strategy to improve the performance of pre-trained language models, notably the Gemini 1.0 Pro Model, in both subtasks. We employ static and dynamic few-shot prompting techniques and introduce a model-generated reasoning strategy that utilizes the LLM's reasoning capabilities to improve performance. Our approach demonstrated significant improvements, showing that it performed better than the baseline models by a considerable margin but fell short of performing as well as the human annotators, thus highlighting the efficacy of the proposed strategies.
