Abdelhak at SemEval-2024 Task 9 : Decoding Brainteasers, The Efficacy of Dedicated Models Versus ChatGPT
Abdelhak Kelious, Mounir Okirim
TL;DR
The paper addresses evaluating AI lateral thinking through the BRAINTEASER tasks, proposing a transformer-based, task-specific model that excels in sentence puzzles and a comparative analysis with ChatGPT under different temperatures. It details a methodology that constructs question–choice pairs, uses CLS token representations, and applies a dense layer with softmax to select answers, trained on a DeBERTa-v3-base backbone. Results show the dedicated model achieves Rank 1 with 0.98 accuracy on sentence puzzles and 0.61 on word puzzles, while ChatGPT attains 0.59 and 0.27, respectively, highlighting a gap in creative reasoning for general-purpose models. The work underlines the value of specialized approaches for creative reasoning and provides insights into how temperature settings influence lateral thinking in large language models, guiding future improvements in word-puzzle capabilities and overall reasoning strategies.
Abstract
This study introduces a dedicated model aimed at solving the BRAINTEASER task 9 , a novel challenge designed to assess models lateral thinking capabilities through sentence and word puzzles. Our model demonstrates remarkable efficacy, securing Rank 1 in sentence puzzle solving during the test phase with an overall score of 0.98. Additionally, we explore the comparative performance of ChatGPT, specifically analyzing how variations in temperature settings affect its ability to engage in lateral thinking and problem-solving. Our findings indicate a notable performance disparity between the dedicated model and ChatGPT, underscoring the potential of specialized approaches in enhancing creative reasoning in AI.
