CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept Rotation
Jirui Yang, Zheyu Lin, Zhihui Lu, Yinggui Wang, Lei Wang, Tao Wei, Qiang Duan, Xin Du, Shuhan Yang
TL;DR
This work tackles jailbreak risks in embodied intelligence by introducing Concept Enhanced Engineering (CEE), a dynamic inference-time defense that constructs a multilingual safety subspace and uses SLERP-based rotation to steer hidden states toward safe outputs without retraining. By extracting universal safety patterns across languages, mapping per-input activations into a subspace via ridge regression, and applying norm-preserving rotations, CEE achieves high defense rates while preserving generation quality and keeping inference overhead low. Empirical results on multiple EI benchmarks show CEE outperforms existing ITS defenses and ES modules, with strong cross-lingual generalization and robust performance across diverse jailbreak strategies. The approach offers a practical, scalable safety mechanism for real-world EI systems, though further work is needed to validate under black-box access and in physical robotic deployments.
Abstract
Large language models (LLMs) are widely used for task understanding and action planning in embodied intelligence (EI) systems, but their adoption substantially increases vulnerability to jailbreak attacks. While recent work explores inference-time defenses, existing methods rely on static interventions on intermediate representations, which often degrade generation quality and impair adherence to task instructions, reducing system usability in EI settings. We propose a dynamic defense framework. For each EI inference request, we dynamically construct a task-specific safety-semantic subspace, project its hidden state to the most relevant direction, and apply SLERP rotation for adaptive safety control. At comparable defense success rates, our method preserves generation quality, improves usability, reduces tuning cost, and strengthens robustness in EI scenarios.
