Table of Contents
Fetching ...

Small Wins Big: Comparing Large Language Models and Domain Fine-Tuned Models for Sarcasm Detection in Code-Mixed Hinglish Text

Bitan Majumder, Anirban Sen

TL;DR

This study compares four large language models with a fine-tuned DistilBERT model for sarcasm detection in code-mixed Hinglish text, and indicates that domain-adaptive fine-tuning of smaller transformer based models may significantly improve sarcasm detection over general LLM inference, in low-resource and data scarce settings.

Abstract

Sarcasm detection in multilingual and code-mixed environments remains a challenging task for natural language processing models due to structural variations, informal expressions, and low-resource linguistic availability. This study compares four large language models, Llama 3.1, Mistral, Gemma 3, and Phi-4, with a fine-tuned DistilBERT model for sarcasm detection in code-mixed Hinglish text. The results indicate that the smaller, sequentially fine-tuned DistilBERT model achieved the highest overall accuracy of 84%, outperforming all of the LLMs in zero and few-shot set ups, using minimal LLM generated code-mixed data used for fine-tuning. These findings indicate that domain-adaptive fine-tuning of smaller transformer based models may significantly improve sarcasm detection over general LLM inference, in low-resource and data scarce settings.

Small Wins Big: Comparing Large Language Models and Domain Fine-Tuned Models for Sarcasm Detection in Code-Mixed Hinglish Text

TL;DR

This study compares four large language models with a fine-tuned DistilBERT model for sarcasm detection in code-mixed Hinglish text, and indicates that domain-adaptive fine-tuning of smaller transformer based models may significantly improve sarcasm detection over general LLM inference, in low-resource and data scarce settings.

Abstract

Sarcasm detection in multilingual and code-mixed environments remains a challenging task for natural language processing models due to structural variations, informal expressions, and low-resource linguistic availability. This study compares four large language models, Llama 3.1, Mistral, Gemma 3, and Phi-4, with a fine-tuned DistilBERT model for sarcasm detection in code-mixed Hinglish text. The results indicate that the smaller, sequentially fine-tuned DistilBERT model achieved the highest overall accuracy of 84%, outperforming all of the LLMs in zero and few-shot set ups, using minimal LLM generated code-mixed data used for fine-tuning. These findings indicate that domain-adaptive fine-tuning of smaller transformer based models may significantly improve sarcasm detection over general LLM inference, in low-resource and data scarce settings.
Paper Structure (23 sections, 6 figures, 13 tables)

This paper contains 23 sections, 6 figures, 13 tables.

Figures (6)

  • Figure 1: DistilBERT training and fine-tuning pipeline
  • Figure 2: Prompt used in Gemini 2.5 Pro for synthetic code-mixed data generation
  • Figure 3: Prompt used for LLM-based sarcasm classification
  • Figure 4: AUPRC Curve after Fine-tuning with Code-mixed Sarcasm (red) and English Sarcasm (blue) data
  • Figure 5: Model performance with respect to different Training Size before Fine-tuning on Code-Mixed Sarcasm data
  • ...and 1 more figures