AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

Mina Ghashami; Soumya Smruti Mishra

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

Mina Ghashami, Soumya Smruti Mishra

TL;DR

This work tackles the SemEval-2024 BRAINTEASER task, which probes lateral thinking in NLP through Sentence Puzzle and Word Puzzle challenges. The authors implement a multi-model pipeline centered on AutoModelForMultipleChoice with DeBERTaV3, and enrich training with diverse data sources including RiddleSense and GPT-4-generated humor, achieving top performance on Sentence Puzzle (92.5% accuracy) and strong, though comparatively lower, results on Word Puzzle (80.2% accuracy). They show that a focused multiple-choice architecture outperforms sequence-classification baselines and that data augmentation with humor and riddle-like content boosts lateral-thinking capabilities. The results underscore the importance of diverse, creatively structured training data and robust evaluation against adversarial puzzle variants for advancing commonsense-defying reasoning in NLP, with substantial implications for AI systems requiring flexible and divergent problem solving.

Abstract

The SemEval 2024 BRAINTEASER task represents a pioneering venture in Natural Language Processing (NLP) by focusing on lateral thinking, a dimension of cognitive reasoning that is often overlooked in traditional linguistic analyses. This challenge comprises of Sentence Puzzle and Word Puzzle subtasks and aims to test language models' capacity for divergent thinking. In this paper, we present our approach to the BRAINTEASER task. We employ a holistic strategy by leveraging cutting-edge pre-trained models in multiple choice architecture, and diversify the training data with Sentence and Word Puzzle datasets. To gain further improvement, we fine-tuned the model with synthetic humor or jokes dataset and the RiddleSense dataset which helped augmenting the model's lateral thinking abilities. Empirical results show that our approach achieve 92.5% accuracy in Sentence Puzzle subtask and 80.2% accuracy in Word Puzzle subtask.

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

TL;DR

Abstract

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

Authors

TL;DR

Abstract

Table of Contents