Table of Contents
Fetching ...

LOLgorithm: Integrating Semantic,Syntactic and Contextual Elements for Humor Classification

Tanisha Khurana, Kaushik Pillalamarri, Vikram Pande, Munindar Singh

TL;DR

This work tackles textual humor recognition by fusing syntactic, semantic, and contextual cues within a ColBERT-based framework. It introduces a hybrid architecture that combines 33 handcrafted features with 104-dimensional contextual embeddings, processed through parallel hidden layers and dense layers to predict humor with a sigmoid output. SHAP analyses and decision trees reveal that semantic features and context often dominate, while specific syntactic cues like VP_count remain informative, and their combination yields better generalization on unseen data. The study underscores the practical benefits for conversational agents and analytics, while acknowledging limitations such as lack of audio data and potential overfitting, and points to future work exploring larger models and multi-dataset evaluation.

Abstract

This paper explores humor detection through a linguistic lens, prioritizing syntactic, semantic, and contextual features over computational methods in Natural Language Processing. We categorize features into syntactic, semantic, and contextual dimensions, including lexicons, structural statistics, Word2Vec, WordNet, and phonetic style. Our proposed model, Colbert, utilizes BERT embeddings and parallel hidden layers to capture sentence congruity. By combining syntactic, semantic, and contextual features, we train Colbert for humor detection. Feature engineering examines essential syntactic and semantic features alongside BERT embeddings. SHAP interpretations and decision trees identify influential features, revealing that a holistic approach improves humor detection accuracy on unseen data. Integrating linguistic cues from different dimensions enhances the model's ability to understand humor complexity beyond traditional computational methods.

LOLgorithm: Integrating Semantic,Syntactic and Contextual Elements for Humor Classification

TL;DR

This work tackles textual humor recognition by fusing syntactic, semantic, and contextual cues within a ColBERT-based framework. It introduces a hybrid architecture that combines 33 handcrafted features with 104-dimensional contextual embeddings, processed through parallel hidden layers and dense layers to predict humor with a sigmoid output. SHAP analyses and decision trees reveal that semantic features and context often dominate, while specific syntactic cues like VP_count remain informative, and their combination yields better generalization on unseen data. The study underscores the practical benefits for conversational agents and analytics, while acknowledging limitations such as lack of audio data and potential overfitting, and points to future work exploring larger models and multi-dataset evaluation.

Abstract

This paper explores humor detection through a linguistic lens, prioritizing syntactic, semantic, and contextual features over computational methods in Natural Language Processing. We categorize features into syntactic, semantic, and contextual dimensions, including lexicons, structural statistics, Word2Vec, WordNet, and phonetic style. Our proposed model, Colbert, utilizes BERT embeddings and parallel hidden layers to capture sentence congruity. By combining syntactic, semantic, and contextual features, we train Colbert for humor detection. Feature engineering examines essential syntactic and semantic features alongside BERT embeddings. SHAP interpretations and decision trees identify influential features, revealing that a holistic approach improves humor detection accuracy on unseen data. Integrating linguistic cues from different dimensions enhances the model's ability to understand humor complexity beyond traditional computational methods.
Paper Structure (27 sections, 9 figures, 2 tables)

This paper contains 27 sections, 9 figures, 2 tables.

Figures (9)

  • Figure 1: SHAP on NRCLex
  • Figure 2: DT on NRCLex
  • Figure 3: SHAP on Syntactic Features
  • Figure 4: DT on Syntactic Features
  • Figure 5: SHAP on semantic features
  • ...and 4 more figures