Focal Inferential Infusion Coupled with Tractable Density Discrimination for Implicit Hate Detection

Sarah Masud; Ashutosh Bajpai; Tanmoy Chakraborty

Focal Inferential Infusion Coupled with Tractable Density Discrimination for Implicit Hate Detection

Sarah Masud, Ashutosh Bajpai, Tanmoy Chakraborty

TL;DR

FiADD addresses the challenge of detecting implicit hate by marrying adaptive density discrimination with inferential infusion to align surface statements with their implied meanings in a focused latent space. The method adds a novel inferential ADD loss and a focal weighting to guide local neighborhood discrimination, combined with a cross-entropy objective. Across three implicit-hate datasets and three SemEval tasks, FiADD improves macro-F1 in both two-way and three-way classifications and shows robust improvements with general-purpose PLMs, while revealing richer latent-space structure that validates the inferred-context alignment. The approach offers a practical, extendable framework for implicit-text detection and related tasks, with broader implications for semantically nuanced classification where surface form diverges from meaning.

Abstract

Although pretrained large language models (PLMs) have achieved state-of-the-art on many natural language processing (NLP) tasks, they lack an understanding of subtle expressions of implicit hate speech. Various attempts have been made to enhance the detection of implicit hate by augmenting external context or enforcing label separation via distance-based metrics. Combining these two approaches, we introduce FiADD, a novel Focused Inferential Adaptive Density Discrimination framework. FiADD enhances the PLM finetuning pipeline by bringing the surface form/meaning of an implicit hate speech closer to its implied form while increasing the inter-cluster distance among various labels. We test FiADD on three implicit hate datasets and observe significant improvement in the two-way and three-way hate classification tasks. We further experiment on the generalizability of FiADD on three other tasks, detecting sarcasm, irony, and stance, in which surface and implied forms differ, and observe similar performance improvements. Consequently, we analyze the generated latent space to understand its evolution under FiADD, which corroborates the advantage of employing FiADD for implicit hate speech detection.

Focal Inferential Infusion Coupled with Tractable Density Discrimination for Implicit Hate Detection

TL;DR

Abstract

Paper Structure (11 sections, 5 equations, 7 figures, 8 tables)

This paper contains 11 sections, 5 equations, 7 figures, 8 tables.

Introduction
Related Work
Intuition and Background
Background on Adaptive Density Discrimination.
Proposed Method
Experimental Setup
Results and Abaltions
Does FiADD really improve implicit hate detection?
Latent Space Analysis
Conclusion
Limitations and Future Work

Figures (7)

Figure 1: The three objectives of FiADD as applied to implicit hate detection are (a) adaptive density discrimination, (b) higher penalty on boundary samples, and (c) bringing the surface and semantic form of the implicit hate closer.
Figure 2: The L1 inter-cluster distances between neutral (N) and explicit hate (E)), as well as non-hate and implicit hate (I) samples based on ALD and ACLD.
Figure 3: The architecture of FiADD. Input X is a set of texts, implied annotations (only for implicit class), and class labels. PLM: pretrained language model (frozen). ${R'}_{nhate}$, ${R'}_{exp}$ and ${R'}_{imp}$ are the representatives for seed and imposter clusters of non-hate, explicit, and implicit, respectively. ${R'}_{inf}$ represents inferential meaning for corresponding ${R'}_{imp}$. ACE is alpha cross-entropy, and $ADD^{Inf+foc}$ is the adaptive density discriminator with inferential + focal objective.
Figure 4: The variation in performance with changing values of (a) number of clusters (k) and (b) focal parameter ($\gamma$ ). We employ BERT on AbuseEval with $ADD^{foc}$ in the two-way classification.
Figure 5: Error analysis with (a) correctly and (b) incorrectly classified samples in three-way classification on LatentHatred. Here, scores A and B are the relative positions of implicit sample w.r.t non-hate and explicit space finetuned with ACE and $ADD^{inf + foc}$, respectively.
...and 2 more figures

Focal Inferential Infusion Coupled with Tractable Density Discrimination for Implicit Hate Detection

TL;DR

Abstract

Focal Inferential Infusion Coupled with Tractable Density Discrimination for Implicit Hate Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (7)