Table of Contents
Fetching ...

SandboxAQ's submission to MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval

Isidora Chara Tourni, Sayontan Ghosh, Brenda Miao, Constantijn van der Poel

TL;DR

The problems of Question Answering (QA) and Named Entity Recognition (NER) in five diverse languages are explored and the need for task-specific approaches in multilingual NLP is highlighted and current models may develop different linguistic competencies for different tasks.

Abstract

This paper explores the problems of Question Answering (QA) and Named Entity Recognition (NER) in five diverse languages. We tested five Large Language Models with various prompting methods, including zero-shot, chain-of-thought reasoning, and translation techniques. Our results show that while some models consistently outperform others, their effectiveness varies significantly across tasks and languages. We saw that advanced prompting techniques generally improved QA performance but had mixed results for NER; and we observed that language difficulty patterns differed between tasks. Our findings highlight the need for task-specific approaches in multilingual NLP and suggest that current models may develop different linguistic competencies for different tasks.

SandboxAQ's submission to MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval

TL;DR

The problems of Question Answering (QA) and Named Entity Recognition (NER) in five diverse languages are explored and the need for task-specific approaches in multilingual NLP is highlighted and current models may develop different linguistic competencies for different tasks.

Abstract

This paper explores the problems of Question Answering (QA) and Named Entity Recognition (NER) in five diverse languages. We tested five Large Language Models with various prompting methods, including zero-shot, chain-of-thought reasoning, and translation techniques. Our results show that while some models consistently outperform others, their effectiveness varies significantly across tasks and languages. We saw that advanced prompting techniques generally improved QA performance but had mixed results for NER; and we observed that language difficulty patterns differed between tasks. Our findings highlight the need for task-specific approaches in multilingual NLP and suggest that current models may develop different linguistic competencies for different tasks.

Paper Structure

This paper contains 12 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Method Flowchart
  • Figure 2: Comparison of QA performance (Accuracy %) between the three best performing models, gpt-4o, gpt-4-turbo, and claude-3.5-sonnet, across all languages and methods. + T indicates the + Translation experiment.
  • Figure 3: Comparison of NER performance (F1 score) between the three best performing models, gpt-4o, gpt-4-turbo, and claude-3.5-sonnet, across all languages and methods. + T indicates the + Translation experiment.