Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs

Ming Wen; Kun Yang; Xin Chen; Jingyu Zhang; Dingding Han; Shiwen Cui; Yuedong Xu

Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs

Ming Wen, Kun Yang, Xin Chen, Jingyu Zhang, Dingding Han, Shiwen Cui, Yuedong Xu

Abstract

Multimodal Large Language Models (MLLMs) pose critical safety challenges, as they are susceptible not only to adversarial attacks such as jailbreaking but also to inadvertently generating harmful content for benign users. While internal safety alignment via Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) is a primary mitigation strategy, current methods often face a safety-utility trade-off: they either refuse benign queries out of excessive caution or overlook latent risks in cross-modal interactions. To resolve this, we introduce Pragma-VL, an end-to-end alignment algorithm that enables MLLMs to pragmatically arbitrate between safety and helpfulness. First, we enhance visual risk perception with a novel cold-start SFT stage. This is achieved by applying risk-aware clustering to the visual encoder and using an interleaved dataset of risk descriptions and high-quality data. Second, we introduce a theoretically-guaranteed reward model that leverages synergistic learning. We train it with a novel data augmentation method that assigns dynamic weights based on the queries, enabling contextual arbitration between safety and helpfulness. Extensive experiments show that Pragma-VL effectively balances safety and helpfulness, outperforming baselines by 5% to 20% on most multimodal safety benchmarks while preserving its general capabilities in areas such as mathematics and knowledge reasoning.

Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs

Abstract

Paper Structure (24 sections, 3 theorems, 17 equations, 15 figures, 8 tables, 2 algorithms)

This paper contains 24 sections, 3 theorems, 17 equations, 15 figures, 8 tables, 2 algorithms.

introduction
Related Works
Methods: Pragma-VL
Contextual Data Augmentation
MLLM Cold Start: Establishing the Risk-Aware Foundation
Policy Alignment via Prompt-Regulated Rewards
Why Parallel Rewards?
Reward Modeling and RL Alignment
Experiment
Experimental Settings
Evaluation on Safety
Evaluation on General Ability
Ablation Studies
Conclusion
The Use of Large Language Models (LLMs)
...and 9 more sections

Key Result

Theorem 1

If the reward function $r(y; \theta)$ is differentiable, the expected errors for the three frameworks, as specified in Definition def:error_metrics, follow the strict orderings for both MSE and Preference Error: where the subscripts correspond to the estimators $\hat{\theta}_{par}$, $\hat{\theta}_{seq}$, and $\hat{\theta}_{single}$.

Figures (15)

Figure 1: The dual failure modes of static safety policies in MLLMs. Our work aims to train a pragmatic model that dynamically arbitrates safety and helpfulness trade-off based on the context.
Figure 2: (a) Overview of Pragma-VL, which train the MLLM to perform context-aware dynamic arbitration, achieving a flexible balance between safety and helpfulness. (b) An illustration of our Contextual Data Augmentation Pipeline.
Figure 3: Pragma-VL Algorithm Pipeline.(a) MLLM Cold-Start (b) Prompt Regulated Reward
Figure 4: Ablation study of the Pragma-VL framework. Results consistently demonstrate that the full Pragma-VL framework outperforms its individual components, highlighting the synergistic effect of combining risk-aware pre-alignment with subsequent policy alignment.
Figure 5: (a) The distribution of items across all categories. (b) Score distributions for helpfulness, safety, and weighted metrics (top), with the corresponding word length distribution for each score bin (bottom).
...and 10 more figures

Theorems & Definitions (5)

Definition 1: Error Metrics
Theorem 1: Error Ordering of Reward Model Architectures
proof
Lemma 1: UpperBound of Pair-wise Preference Error zhang2025bradleyterrymultiobjectiverewardmodeling
Lemma 2: Approximation of MSE from Parameter Covariance zhang2025bradleyterrymultiobjectiverewardmodeling

Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs

Abstract

Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs

Authors

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (5)