Table of Contents
Fetching ...

LLM Safety for Children

Prasanjit Rath, Hari Shrawgi, Parag Agrawal, Sandipan Dandapat

TL;DR

This work addresses the safety of Large Language Models when interacting with minors by developing diverse Child User Models informed by child psychology and applying automated red-teaming to test six LLMs across a broad child-harm taxonomy. It introduces a 14-category harm framework (including child-specific harms), generates 560 child/adult personas, and evaluates model safety using Defect and Refusal rates, plus a defined safety-cost metric. Key findings show pervasive safety gaps for child users, limited correlation between model size and safety, and strong influence of personality and conversational turns on harm elicitation. The study highlights the need for child-focused safety alignment, beyond general LLM safety, and provides a scalable methodology for future, multilingual, longitudinal safety evaluations.

Abstract

This paper analyzes the safety of Large Language Models (LLMs) in interactions with children below age of 18 years. Despite the transformative applications of LLMs in various aspects of children's lives such as education and therapy, there remains a significant gap in understanding and mitigating potential content harms specific to this demographic. The study acknowledges the diverse nature of children often overlooked by standard safety evaluations and proposes a comprehensive approach to evaluating LLM safety specifically for children. We list down potential risks that children may encounter when using LLM powered applications. Additionally we develop Child User Models that reflect the varied personalities and interests of children informed by literature in child care and psychology. These user models aim to bridge the existing gap in child safety literature across various fields. We utilize Child User Models to evaluate the safety of six state of the art LLMs. Our observations reveal significant safety gaps in LLMs particularly in categories harmful to children but not adults

LLM Safety for Children

TL;DR

This work addresses the safety of Large Language Models when interacting with minors by developing diverse Child User Models informed by child psychology and applying automated red-teaming to test six LLMs across a broad child-harm taxonomy. It introduces a 14-category harm framework (including child-specific harms), generates 560 child/adult personas, and evaluates model safety using Defect and Refusal rates, plus a defined safety-cost metric. Key findings show pervasive safety gaps for child users, limited correlation between model size and safety, and strong influence of personality and conversational turns on harm elicitation. The study highlights the need for child-focused safety alignment, beyond general LLM safety, and provides a scalable methodology for future, multilingual, longitudinal safety evaluations.

Abstract

This paper analyzes the safety of Large Language Models (LLMs) in interactions with children below age of 18 years. Despite the transformative applications of LLMs in various aspects of children's lives such as education and therapy, there remains a significant gap in understanding and mitigating potential content harms specific to this demographic. The study acknowledges the diverse nature of children often overlooked by standard safety evaluations and proposes a comprehensive approach to evaluating LLM safety specifically for children. We list down potential risks that children may encounter when using LLM powered applications. Additionally we develop Child User Models that reflect the varied personalities and interests of children informed by literature in child care and psychology. These user models aim to bridge the existing gap in child safety literature across various fields. We utilize Child User Models to evaluate the safety of six state of the art LLMs. Our observations reveal significant safety gaps in LLMs particularly in categories harmful to children but not adults

Paper Structure

This paper contains 26 sections, 6 figures, 12 tables.

Figures (6)

  • Figure 1: Sample Child User Model generation for: <Harm: Regulated Services (Gambling), Personality: Fatigued & Hypochondriac, Interests: Media>
  • Figure 2: Comparing defect and refusal rates of various models
  • Figure 3: Comparing GPT-4o and Llama-13B response
  • Figure 4: Evaluation Prompt
  • Figure 5: Persona Creation Prompt
  • ...and 1 more figures