Overview of the 2023 ICON Shared Task on Gendered Abuse Detection in Indic Languages
Aatman Vaidya, Arnav Arora, Aditya Joshi, Tarunima Prabhakar
TL;DR
The paper presents the ICON 2023 Shared Task on Gendered Abuse Detection in Indic Languages, focusing on detecting gendered abuse in Hindi, Tamil, and Indian English. It introduces a novel annotated Twitter dataset of roughly 6.5k–7.9k posts per language, labeled with three categories capturing directed abuse and explicit language, developed via participatory annotation and linked to the Uli project. Three subtasks are defined, including transfer learning from external datasets and a multi-task objective, with evaluation conducted on Kaggle using $F\text{-}1$ metrics. Despite nine registrations, only two teams submitted systems, with CNLP-NITS-PP achieving the top scores, and the dataset is released openly to foster ongoing research and improve online safety in Indic-language communities.
Abstract
This paper reports the findings of the ICON 2023 on Gendered Abuse Detection in Indic Languages. The shared task deals with the detection of gendered abuse in online text. The shared task was conducted as a part of ICON 2023, based on a novel dataset in Hindi, Tamil and the Indian dialect of English. The participants were given three subtasks with the train dataset consisting of approximately 6500 posts sourced from Twitter. For the test set, approximately 1200 posts were provided. The shared task received a total of 9 registrations. The best F-1 scores are 0.616 for subtask 1, 0.572 for subtask 2 and, 0.616 and 0.582 for subtask 3. The paper contains examples of hateful content owing to its topic.
