Table of Contents
Fetching ...

On the Dual-Use Dilemma in Physical Reasoning and Force

William Xie, Enora Rice, Nikolaus Correll

TL;DR

This work tackles the dual-use dilemma of enabling physical reasoning and force in vision-language models (VLMs) used for robot control. It uses two case studies to evaluate Asimovian safeguarding prompts on wrench planning and grasp-force control across multiple models and tasks, revealing a persistent trade-off between safety and capability. Key findings show safeguarding can significantly reduce both harmful and helpful outputs (e.g., harmful elicitation from $53\%$ to $19\%$, helpful from $50\%$ to $38\%$ in wrench scenarios; grasp-harm from $67\%$ to $1.7\%$, helpful from $91\%$ to $45\%$), with substantial model-dependent variations and a mean harm-detection rate around $71\%$. The authors argue for human-centered evaluation and development to balance safety with practical capability in robot learning, and advocate approaches that mitigate dual-use without stifling progress in contact-rich manipulation, with implications for real-world applications such as elderly-care robotics.

Abstract

Humans learn how and when to apply forces in the world via a complex physiological and psychological learning process. Attempting to replicate this in vision-language models (VLMs) presents two challenges: VLMs can produce harmful behavior, which is particularly dangerous for VLM-controlled robots which interact with the world, but imposing behavioral safeguards can limit their functional and ethical extents. We conduct two case studies on safeguarding VLMs which generate forceful robotic motion, finding that safeguards reduce both harmful and helpful behavior involving contact-rich manipulation of human body parts. Then, we discuss the key implication of this result--that value alignment may impede desirable robot capabilities--for model evaluation and robot learning.

On the Dual-Use Dilemma in Physical Reasoning and Force

TL;DR

This work tackles the dual-use dilemma of enabling physical reasoning and force in vision-language models (VLMs) used for robot control. It uses two case studies to evaluate Asimovian safeguarding prompts on wrench planning and grasp-force control across multiple models and tasks, revealing a persistent trade-off between safety and capability. Key findings show safeguarding can significantly reduce both harmful and helpful outputs (e.g., harmful elicitation from to , helpful from to in wrench scenarios; grasp-harm from to , helpful from to ), with substantial model-dependent variations and a mean harm-detection rate around . The authors argue for human-centered evaluation and development to balance safety with practical capability in robot learning, and advocate approaches that mitigate dual-use without stifling progress in contact-rich manipulation, with implications for real-world applications such as elderly-care robotics.

Abstract

Humans learn how and when to apply forces in the world via a complex physiological and psychological learning process. Attempting to replicate this in vision-language models (VLMs) presents two challenges: VLMs can produce harmful behavior, which is particularly dangerous for VLM-controlled robots which interact with the world, but imposing behavioral safeguards can limit their functional and ethical extents. We conduct two case studies on safeguarding VLMs which generate forceful robotic motion, finding that safeguards reduce both harmful and helpful behavior involving contact-rich manipulation of human body parts. Then, we discuss the key implication of this result--that value alignment may impede desirable robot capabilities--for model evaluation and robot learning.

Paper Structure

This paper contains 13 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Varying contextual semantics in the same scene can yield harm and help, often with a thin line separating them. We evaluate how VLMs under different prompt schemes which elicit physical reasoning for robot control navigate this line between harm and help for forceful, contact-rich tasks with potential for bodily danger.
  • Figure 2: Additional safeguarding reduces harmful wrenches on average by 34% (absolute, 53% to 19%). It completely reduces harmful behavior from Claude 3.7 Sonnet (20% to 0%) and by 57% for GPT 4.1 Mini (84% to 27%). Gemini 2.0 Flash is the least responsive to safeguarding, decreasing 23% (55% to 32%). Safeguarding is roughly less effective as prompty complexity increases.
  • Figure 3: Safeguarding has an adverse effect on helpful behavior elicitation, reducing it by 11% (absolute, 50% to 39%). OpenAI GPT 4.1 Mini is least affected, decreasing by 3% (75% to 72%). Claude 3.7 Sonnet is reduced by 13% (31% to 18%) and Gemini 2.0 Flash by 19% (44% to 25%). Helpful behavior increases with spatial and physical reasoning, and harm detection by safeguarding decreases.
  • Figure 4: We evaluate additional prompting schemes for physical reasoning about grasp forces dg on four helpful tasks (w, n and W, N corresponding to low and high force magnitude tasks, respectively) and two harmful tasks (W, N). Safeguards (dashed bars) completely suppress harm (right), but greatly reduce helpful behavior (left).
  • Figure 5: Wrench magnitudes for OpenAI and Gemini models are relatively consistent, whereas Claude 3.7 Sonnet fluctuates considerably. This is due to a lower quantity of unblocked responses, resulting in greater variance, as well as an observed behavior of attempting to break the robot wrist itself, rather than the human wrist, resulting in even higher wrenches.
  • ...and 1 more figures