Table of Contents
Fetching ...

o3-mini vs DeepSeek-R1: Which One is Safer?

Aitor Arrieta, Miriam Ugarte, Pablo Valle, José Antonio Parejo, Sergio Segura

TL;DR

The paper evaluates safety alignment of two leading LLMs, DeepSeek-R1 and o3-mini, using the automated safety tester ASTRAL to generate 1,260 balanced unsafe prompts. Results show o3-mini is substantially safer (1.19% unsafe) than DeepSeek-R1 (11.98% unsafe), with many o3-mini unsafe prompts blocked by API safeguards before reaching the model. DeepSeek-R1’s unsafe outputs tend to be more severe and easier to confirm, and its risk concentrates in categories such as financial crime, violence, and terrorism, while certain writing styles amplify risk. The work highlights the importance of robust guardrails and continuous safety evaluation for real-world deployment, and it provides a replication package and ongoing plans for broader testing under evolving regulatory contexts.

Abstract

The irruption of DeepSeek-R1 constitutes a turning point for the AI industry in general and the LLMs in particular. Its capabilities have demonstrated outstanding performance in several tasks, including creative thinking, code generation, maths and automated program repair, at apparently lower execution cost. However, LLMs must adhere to an important qualitative property, i.e., their alignment with safety and human values. A clear competitor of DeepSeek-R1 is its American counterpart, OpenAI's o3-mini model, which is expected to set high standards in terms of performance, safety and cost. In this technical report, we systematically assess the safety level of both DeepSeek-R1 (70b version) and OpenAI's o3-mini (beta version). To this end, we make use of our recently released automated safety testing tool, named ASTRAL. By leveraging this tool, we automatically and systematically generated and executed 1,260 test inputs on both models. After conducting a semi-automated assessment of the outcomes provided by both LLMs, the results indicate that DeepSeek-R1 produces significantly more unsafe responses (12%) than OpenAI's o3-mini (1.2%).

o3-mini vs DeepSeek-R1: Which One is Safer?

TL;DR

The paper evaluates safety alignment of two leading LLMs, DeepSeek-R1 and o3-mini, using the automated safety tester ASTRAL to generate 1,260 balanced unsafe prompts. Results show o3-mini is substantially safer (1.19% unsafe) than DeepSeek-R1 (11.98% unsafe), with many o3-mini unsafe prompts blocked by API safeguards before reaching the model. DeepSeek-R1’s unsafe outputs tend to be more severe and easier to confirm, and its risk concentrates in categories such as financial crime, violence, and terrorism, while certain writing styles amplify risk. The work highlights the importance of robust guardrails and continuous safety evaluation for real-world deployment, and it provides a replication package and ongoing plans for broader testing under evolving regulatory contexts.

Abstract

The irruption of DeepSeek-R1 constitutes a turning point for the AI industry in general and the LLMs in particular. Its capabilities have demonstrated outstanding performance in several tasks, including creative thinking, code generation, maths and automated program repair, at apparently lower execution cost. However, LLMs must adhere to an important qualitative property, i.e., their alignment with safety and human values. A clear competitor of DeepSeek-R1 is its American counterpart, OpenAI's o3-mini model, which is expected to set high standards in terms of performance, safety and cost. In this technical report, we systematically assess the safety level of both DeepSeek-R1 (70b version) and OpenAI's o3-mini (beta version). To this end, we make use of our recently released automated safety testing tool, named ASTRAL. By leveraging this tool, we automatically and systematically generated and executed 1,260 test inputs on both models. After conducting a semi-automated assessment of the outcomes provided by both LLMs, the results indicate that DeepSeek-R1 produces significantly more unsafe responses (12%) than OpenAI's o3-mini (1.2%).

Paper Structure

This paper contains 15 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Number of manually confirmed unsafe LLM outputs per writing style, persuasion technique and safety category