Table of Contents
Fetching ...

Toxicity in Online Platforms and AI Systems: A Survey of Needs, Challenges, Mitigations, and Future Directions

Smita Khapre, Melkamu Abay Mersha, Hassan Shakil, Jonali Baruah, Jugal Kalita

TL;DR

The paper addresses toxicity in online platforms and AI systems by proposing a holistic taxonomy and conducting a systematic review of detection, detoxification, and mitigation approaches. It employs a PRISMA-guided methodology to synthesize over 200 studies (2021–2025) across text, image, audio, video, and multi-modal modalities, including generative AI contexts. Key contributions include a comprehensive toxicity taxonomy, critical evaluation of toxicity datasets, and frameworks for detection, detoxification, and evaluation, with attention to biases and psychosocial impacts. The work emphasizes proactive, explainable, and governance-aligned strategies, highlighting multilingual/multimodal expansion and robust evaluation to improve safety and inclusivity in AI-enabled digital ecosystems.

Abstract

The evolution of digital communication systems and the designs of online platforms have inadvertently facilitated the subconscious propagation of toxic behavior. Giving rise to reactive responses to toxic behavior. Toxicity in online content and Artificial Intelligence Systems has become a serious challenge to individual and collective well-being around the world. It is more detrimental to society than we realize. Toxicity, expressed in language, image, and video, can be interpreted in various ways depending on the context of usage. Therefore, a comprehensive taxonomy is crucial to detect and mitigate toxicity in online content, Artificial Intelligence systems, and/or Large Language Models in a proactive manner. A comprehensive understanding of toxicity is likely to facilitate the design of practical solutions for toxicity detection and mitigation. The classification in published literature has focused on only a limited number of aspects of this very complex issue, with a pattern of reactive strategies in response to toxicity. This survey attempts to generate a comprehensive taxonomy of toxicity from various perspectives. It presents a holistic approach to explain the toxicity by understanding the context and environment that society is facing in the Artificial Intelligence era. This survey summarizes the toxicity-related datasets and research on toxicity detection and mitigation for Large Language Models, social media platforms, and other online platforms, detailing their attributes in textual mode, focused on the English language. Finally, we suggest the research gaps in toxicity mitigation based on datasets, mitigation strategies, Large Language Models, adaptability, explainability, and evaluation.

Toxicity in Online Platforms and AI Systems: A Survey of Needs, Challenges, Mitigations, and Future Directions

TL;DR

The paper addresses toxicity in online platforms and AI systems by proposing a holistic taxonomy and conducting a systematic review of detection, detoxification, and mitigation approaches. It employs a PRISMA-guided methodology to synthesize over 200 studies (2021–2025) across text, image, audio, video, and multi-modal modalities, including generative AI contexts. Key contributions include a comprehensive toxicity taxonomy, critical evaluation of toxicity datasets, and frameworks for detection, detoxification, and evaluation, with attention to biases and psychosocial impacts. The work emphasizes proactive, explainable, and governance-aligned strategies, highlighting multilingual/multimodal expansion and robust evaluation to improve safety and inclusivity in AI-enabled digital ecosystems.

Abstract

The evolution of digital communication systems and the designs of online platforms have inadvertently facilitated the subconscious propagation of toxic behavior. Giving rise to reactive responses to toxic behavior. Toxicity in online content and Artificial Intelligence Systems has become a serious challenge to individual and collective well-being around the world. It is more detrimental to society than we realize. Toxicity, expressed in language, image, and video, can be interpreted in various ways depending on the context of usage. Therefore, a comprehensive taxonomy is crucial to detect and mitigate toxicity in online content, Artificial Intelligence systems, and/or Large Language Models in a proactive manner. A comprehensive understanding of toxicity is likely to facilitate the design of practical solutions for toxicity detection and mitigation. The classification in published literature has focused on only a limited number of aspects of this very complex issue, with a pattern of reactive strategies in response to toxicity. This survey attempts to generate a comprehensive taxonomy of toxicity from various perspectives. It presents a holistic approach to explain the toxicity by understanding the context and environment that society is facing in the Artificial Intelligence era. This survey summarizes the toxicity-related datasets and research on toxicity detection and mitigation for Large Language Models, social media platforms, and other online platforms, detailing their attributes in textual mode, focused on the English language. Finally, we suggest the research gaps in toxicity mitigation based on datasets, mitigation strategies, Large Language Models, adaptability, explainability, and evaluation.

Paper Structure

This paper contains 48 sections, 10 figures, 4 tables.

Figures (10)

  • Figure 1: LLMs depicted as Kids, trained on Toxicity Ignorant Datasets, unable to distinguish toxic language from normal language. Just as kids grow up in their toxic environments unaware that it is toxic, when grow up as adults start displaying toxic behavior same as their environments
  • Figure 2: Key Stakeholders in the ecosystem of online platforms and AI systems. Each stakeholder has a role in the generation, detection, moderation, and mitigation of toxic content.
  • Figure 3: A Taxonomy of Toxic Content based on origination of digital communication in online platforms and AI systems via text, image, audio, video, and/or multi-modal channel. Each mode of communication hosts toxicity divided mainly into information-based and interaction-based systems. Red color signifies the impact of toxicity, green is toxicity in interaction-based platforms, Blue represents other online platforms, and blue-green color gradient in both social media and other online platforms. The pink colored box encompasses the toxicities requiring the detection and mitigation effort
  • Figure 4: Implicit Toxicity: The message seems benign or normal, but based on the context, the reader or viewer, or the environment, it intends to harm a particular target. Such messages have implicit toxicity.
  • Figure 5: Explicit Toxicity: The explicit toxic message intends to harm a particular target, which can be easily understood as toxic. Its implied meaning and actual content have the same meaning. Some explicit toxic messages that do not have a target are prevalent in the form of profanity, sarcasm, and other forms of offensive language.
  • ...and 5 more figures