Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models

Kalyan Nakka; Jimmy Dani; Nitesh Saxena

Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models

Kalyan Nakka, Jimmy Dani, Nitesh Saxena

TL;DR

This paper investigates trust and ethics in on-device small language models (SLMs) by comparing them to on-server baselines using the DecodingTrust framework and the Do-Not-Answer ethics dataset. It reveals that on-device deployments exhibit higher stereotype bias, unfairness, and privacy leakage, and show markedly weaker ethical safeguards, often producing harmful or actionable outputs without jailbreaking. The study analyzes three open-source SLMs (Gemma-2B, Phi-2, RedPajama-3B) across multiple metrics and finds statistically significant degradations in trustworthiness on-device, along with exploitable behavior via vanilla prompts. These findings highlight critical risks for edge AI and emphasize the need for robust defenses and responsible deployment strategies to protect user privacy and safety on personal devices.

Abstract

In this paper, we present a very first study to investigate trust and ethical implications of on-device artificial intelligence (AI), focusing on small language models (SLMs) amenable for personal devices like smartphones. While on-device SLMs promise enhanced privacy, reduced latency, and improved user experience compared to cloud-based services, we posit that they might also introduce significant risks and vulnerabilities compared to their on-server counterparts. As part of our trust assessment study, we conduct a systematic evaluation of the state-of-the-art on-devices SLMs, contrasted to their on-server counterparts, based on a well-established trustworthiness measurement framework. Our results show on-device SLMs to be significantly less trustworthy, specifically demonstrating more stereotypical, unfair and privacy-breaching behavior. Informed by these findings, we then perform our ethics assessment study using a dataset of unethical questions, that depicts harmful scenarios. Our results illustrate the lacking ethical safeguards in on-device SLMs, emphasizing their capabilities of generating harmful content. Further, the broken safeguards and exploitable nature of on-device SLMs is demonstrated using potentially unethical vanilla prompts, to which the on-device SLMs answer with valid responses without any filters and without the need for any jailbreaking or prompt engineering. These responses can be abused for various harmful and unethical scenarios like: societal harm, illegal activities, hate, self-harm, exploitable phishing content and many others, all of which indicates the severe vulnerability and exploitability of these on-device SLMs.

Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models

TL;DR

Abstract

Paper Structure (21 sections, 1 equation, 25 figures, 2 tables)

This paper contains 21 sections, 1 equation, 25 figures, 2 tables.

Introduction
Background and Preliminaries
Responsible AI
Deployment Strategies
Studied Models
Trust Assessment
Ethics Assessment
Trust Assessment Study
Methodology
Stereotype Perspective Results
Fairness Perspective Results
Privacy Perspective Results
Statistical Analysis
Ethics Assessment Study
Methodology
...and 6 more sections

Figures (25)

Figure 1: Our Study in a Nutshell
Figure 2: Benign scenario's model agreeability $A_i$ heatmaps of Gemma-2B (The higher the values of $A_i$ indicates that the SLM is more biased).
Figure 3: Benign scenario's model agreeability $A_i$ heatmaps of Phi-2 (The higher the values of $A_i$ indicates that the SLM is more biased).
Figure 4: Benign scenario's model agreeability $A_i$ heatmaps of RedPajama-3B (The higher the values of $A_i$ indicates that the SLM is more biased).
Figure 5: Untargeted scenario's model agreeability $A_i$ heatmaps of Gemma-2B (The higher the values of $A_i$ indicates that the SLM is more biased).
...and 20 more figures

Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models

TL;DR

Abstract

Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (25)