Table of Contents
Fetching ...

iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers

Harshit Gupta, Manav Chaudhary, Tathagata Raha, Shivansh Subramanian, Vasudeva Varma

TL;DR

This paper tackles SemEval-2024 Task 9 BRAINTEASER, which evaluates lateral-thinking in two MCQ subtasks: Sentence Puzzle and Word Puzzle. It proposes a hybrid prompting framework that combines zero-shot and few-shot prompts, contextualized example selection via cosine similarity, and a self-generated reasoning mechanism, evaluated on the Gemini Pro 1 model and GPT-4. The approach yields significant gains over baseline models but does not reach human annotators, underscoring both progress and remaining gaps in computational lateral thinking. The findings highlight practical strategies for improving reasoning in large language models on unconventional questions and suggest directions for future enhancement.

Abstract

This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense. The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models' lateral thinking capabilities. It consists of Sentence Puzzle and Word Puzzle subtasks that require models to defy default common-sense associations and exhibit unconventional thinking. We propose a unique strategy to improve the performance of pre-trained language models, notably the Gemini 1.0 Pro Model, in both subtasks. We employ static and dynamic few-shot prompting techniques and introduce a model-generated reasoning strategy that utilizes the LLM's reasoning capabilities to improve performance. Our approach demonstrated significant improvements, showing that it performed better than the baseline models by a considerable margin but fell short of performing as well as the human annotators, thus highlighting the efficacy of the proposed strategies.

iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers

TL;DR

This paper tackles SemEval-2024 Task 9 BRAINTEASER, which evaluates lateral-thinking in two MCQ subtasks: Sentence Puzzle and Word Puzzle. It proposes a hybrid prompting framework that combines zero-shot and few-shot prompts, contextualized example selection via cosine similarity, and a self-generated reasoning mechanism, evaluated on the Gemini Pro 1 model and GPT-4. The approach yields significant gains over baseline models but does not reach human annotators, underscoring both progress and remaining gaps in computational lateral thinking. The findings highlight practical strategies for improving reasoning in large language models on unconventional questions and suggest directions for future enhancement.

Abstract

This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense. The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models' lateral thinking capabilities. It consists of Sentence Puzzle and Word Puzzle subtasks that require models to defy default common-sense associations and exhibit unconventional thinking. We propose a unique strategy to improve the performance of pre-trained language models, notably the Gemini 1.0 Pro Model, in both subtasks. We employ static and dynamic few-shot prompting techniques and introduce a model-generated reasoning strategy that utilizes the LLM's reasoning capabilities to improve performance. Our approach demonstrated significant improvements, showing that it performed better than the baseline models by a considerable margin but fell short of performing as well as the human annotators, thus highlighting the efficacy of the proposed strategies.
Paper Structure (27 sections, 2 figures, 3 tables)

This paper contains 27 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Few-shot prompting performance on the Sentence Puzzle subtask
  • Figure 2: Few-shot prompting performance on the Word Puzzle subtask