Table of Contents
Fetching ...

Multimodal Adversarial Quality Policy for Safe Grasping

Kunlin Xie Chenghao Li Haolan Zhang, Nak Young Chong

TL;DR

The Multimodal Adversarial Quality Policy (MAQP) is proposed to realize multimodal safe grasping, and the Gradient-Level Modality Balancing Strategy (GLMBS) is designed to resolve the optimization imbalance from RGB and Depth patches in patch shape adaptation.

Abstract

Vision-guided robot grasping based on Deep Neural Networks (DNNs) generalizes well but poses safety risks in the Human-Robot Interaction (HRI). Recent works solved it by designing benign adversarial attacks and patches with RGB modality, yet depth-independent characteristics limit their effectiveness on RGBD modality. In this work, we propose the Multimodal Adversarial Quality Policy (MAQP) to realize multimodal safe grasping. Our framework introduces two key components. First, the Heterogeneous Dual-Patch Optimization Scheme (HDPOS) mitigates the distribution discrepancy between RGB and depth modalities in patch generation by adopting modality-specific initialization strategies, employing a Gaussian distribution for depth patches and a uniform distribution for RGB patches, while jointly optimizing both modalities under a unified objective function. Second, the Gradient-Level Modality Balancing Strategy (GLMBS) is designed to resolve the optimization imbalance from RGB and Depth patches in patch shape adaptation by reweighting gradient contributions based on per-channel sensitivity analysis and applying distance-adaptive perturbation bounds. We conduct extensive experiments on the benchmark datasets and a cobot, showing the effectiveness of MAQP.

Multimodal Adversarial Quality Policy for Safe Grasping

TL;DR

The Multimodal Adversarial Quality Policy (MAQP) is proposed to realize multimodal safe grasping, and the Gradient-Level Modality Balancing Strategy (GLMBS) is designed to resolve the optimization imbalance from RGB and Depth patches in patch shape adaptation.

Abstract

Vision-guided robot grasping based on Deep Neural Networks (DNNs) generalizes well but poses safety risks in the Human-Robot Interaction (HRI). Recent works solved it by designing benign adversarial attacks and patches with RGB modality, yet depth-independent characteristics limit their effectiveness on RGBD modality. In this work, we propose the Multimodal Adversarial Quality Policy (MAQP) to realize multimodal safe grasping. Our framework introduces two key components. First, the Heterogeneous Dual-Patch Optimization Scheme (HDPOS) mitigates the distribution discrepancy between RGB and depth modalities in patch generation by adopting modality-specific initialization strategies, employing a Gaussian distribution for depth patches and a uniform distribution for RGB patches, while jointly optimizing both modalities under a unified objective function. Second, the Gradient-Level Modality Balancing Strategy (GLMBS) is designed to resolve the optimization imbalance from RGB and Depth patches in patch shape adaptation by reweighting gradient contributions based on per-channel sensitivity analysis and applying distance-adaptive perturbation bounds. We conduct extensive experiments on the benchmark datasets and a cobot, showing the effectiveness of MAQP.
Paper Structure (20 sections, 19 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 20 sections, 19 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: An example of safety risk in HRI: the robot mistakenly identifies the human hand or adjacent objects as graspable targets for grasping, causing potential injury to the human worker.
  • Figure 2: Generated RGBD patch example (RGB and depth patch pair). The left is the RGB patch, the right is the depth patch.
  • Figure 3: Robot grasping platform: consisting of an Intel RealSense D435 depth camera, a UFactory 850 robot, a UFactory xArm gripper, and the experimental objects
  • Figure 4: Cases of the DRD process during grasping. Each row represents a single case. From left to right, the images in each row show: the robot’s initial approach to the target object, the first deviation caused by interference, the robot’s re-approach (return) to the target after the human hand moves away, the second deviation following the subsequent interference, and the final grasping after the human hand moves away again. We use the blue border to emphasize the target object that the robot is trying to grasp in each subfigure.