Table of Contents
Fetching ...

Gen-AI for User Safety: A Survey

Akshar Prabhu Desai, Tejasvi Ravi, Mohammad Luqman, Mohit Sharma, Nithya Kota, Pranjul Yadav

TL;DR

This survey tackles the problem of detecting and mitigating user-safety violations in a landscape where traditional ML classifiers struggle with language context and nuance. It surveys Gen-AI approaches across digital and physical domains, across data modalities (text, images, video, audio, code), and in adversarial settings, highlighting methods such as retrieval-augmented generation, multi-modal large language models, and prompting strategies. The work outlines concrete applications in phishing, misinformation, content moderation, mental health support, deepfake detection, and safety-critical video/audio tasks, while also detailing attackers' use of Gen-AI for scale, personalization, and second-order harms, along with defenses like jailbreaking countermeasures. Overall, the paper provides a comprehensive roadmap for leveraging Gen-AI to enhance user safety and underscores the importance of multi-modality, foundation ensembles, and proactive second-order harm prevention for real-world impact.

Abstract

Machine Learning and data mining techniques (i.e. supervised and unsupervised techniques) are used across domains to detect user safety violations. Examples include classifiers used to detect whether an email is spam or a web-page is requesting bank login information. However, existing ML/DM classifiers are limited in their ability to understand natural languages w.r.t the context and nuances. The aforementioned challenges are overcome with the arrival of Gen-AI techniques, along with their inherent ability w.r.t translation between languages, fine-tuning between various tasks and domains. In this manuscript, we provide a comprehensive overview of the various work done while using Gen-AI techniques w.r.t user safety. In particular, we first provide the various domains (e.g. phishing, malware, content moderation, counterfeit, physical safety) across which Gen-AI techniques have been applied. Next, we provide how Gen-AI techniques can be used in conjunction with various data modalities i.e. text, images, videos, audio, executable binaries to detect violations of user-safety. Further, also provide an overview of how Gen-AI techniques can be used in an adversarial setting. We believe that this work represents the first summarization of Gen-AI techniques for user-safety.

Gen-AI for User Safety: A Survey

TL;DR

This survey tackles the problem of detecting and mitigating user-safety violations in a landscape where traditional ML classifiers struggle with language context and nuance. It surveys Gen-AI approaches across digital and physical domains, across data modalities (text, images, video, audio, code), and in adversarial settings, highlighting methods such as retrieval-augmented generation, multi-modal large language models, and prompting strategies. The work outlines concrete applications in phishing, misinformation, content moderation, mental health support, deepfake detection, and safety-critical video/audio tasks, while also detailing attackers' use of Gen-AI for scale, personalization, and second-order harms, along with defenses like jailbreaking countermeasures. Overall, the paper provides a comprehensive roadmap for leveraging Gen-AI to enhance user safety and underscores the importance of multi-modality, foundation ensembles, and proactive second-order harm prevention for real-world impact.

Abstract

Machine Learning and data mining techniques (i.e. supervised and unsupervised techniques) are used across domains to detect user safety violations. Examples include classifiers used to detect whether an email is spam or a web-page is requesting bank login information. However, existing ML/DM classifiers are limited in their ability to understand natural languages w.r.t the context and nuances. The aforementioned challenges are overcome with the arrival of Gen-AI techniques, along with their inherent ability w.r.t translation between languages, fine-tuning between various tasks and domains. In this manuscript, we provide a comprehensive overview of the various work done while using Gen-AI techniques w.r.t user safety. In particular, we first provide the various domains (e.g. phishing, malware, content moderation, counterfeit, physical safety) across which Gen-AI techniques have been applied. Next, we provide how Gen-AI techniques can be used in conjunction with various data modalities i.e. text, images, videos, audio, executable binaries to detect violations of user-safety. Further, also provide an overview of how Gen-AI techniques can be used in an adversarial setting. We believe that this work represents the first summarization of Gen-AI techniques for user-safety.

Paper Structure

This paper contains 31 sections.