Table of Contents
Fetching ...

Robust Differentiable Collision Detection for General Objects

Jiayi Chen, Wei Zhao, Liangwang Ruan, Baoquan Chen, He Wang

TL;DR

This work tackles the lack of differentiability in witness-point-based collision detection, which hinders gradient-based optimization in contact-rich robotics tasks. It introduces a robust differentiable framework built on distance-based softmax smoothing, adaptive sampling, and equivalent gradient transport to enable gradients through witness points for both convex and concave objects. Empirical results on DexGraspNet and Objaverse show substantial improvements in accuracy (median error below $0.1\text{ mm}$ and mm-level gains over baselines) and demonstrate practical utility in dexterous grasp refinement. The approach is scalable, GPU-friendly, and released as open-source, offering a flexible tool for integrating differentiable collision reasoning into planning and control pipelines.

Abstract

Collision detection is a core component of robotics applications such as simulation, control, and planning. Traditional algorithms like GJK+EPA compute witness points (i.e., the closest or deepest-penetration pairs between two objects) but are inherently non-differentiable, preventing gradient flow and limiting gradient-based optimization in contact-rich tasks such as grasping and manipulation. Recent work introduced efficient first-order randomized smoothing to make witness points differentiable; however, their direction-based formulation is restricted to convex objects and lacks robustness for complex geometries. In this work, we propose a robust and efficient differentiable collision detection framework that supports both convex and concave objects across diverse scales and configurations. Our method introduces distance-based first-order randomized smoothing, adaptive sampling, and equivalent gradient transport for robust and informative gradient computation. Experiments on complex meshes from DexGraspNet and Objaverse show significant improvements over existing baselines. Finally, we demonstrate a direct application of our method for dexterous grasp synthesis to refine the grasp quality. The code is available at https://github.com/JYChen18/DiffCollision.

Robust Differentiable Collision Detection for General Objects

TL;DR

This work tackles the lack of differentiability in witness-point-based collision detection, which hinders gradient-based optimization in contact-rich robotics tasks. It introduces a robust differentiable framework built on distance-based softmax smoothing, adaptive sampling, and equivalent gradient transport to enable gradients through witness points for both convex and concave objects. Empirical results on DexGraspNet and Objaverse show substantial improvements in accuracy (median error below and mm-level gains over baselines) and demonstrate practical utility in dexterous grasp refinement. The approach is scalable, GPU-friendly, and released as open-source, offering a flexible tool for integrating differentiable collision reasoning into planning and control pipelines.

Abstract

Collision detection is a core component of robotics applications such as simulation, control, and planning. Traditional algorithms like GJK+EPA compute witness points (i.e., the closest or deepest-penetration pairs between two objects) but are inherently non-differentiable, preventing gradient flow and limiting gradient-based optimization in contact-rich tasks such as grasping and manipulation. Recent work introduced efficient first-order randomized smoothing to make witness points differentiable; however, their direction-based formulation is restricted to convex objects and lacks robustness for complex geometries. In this work, we propose a robust and efficient differentiable collision detection framework that supports both convex and concave objects across diverse scales and configurations. Our method introduces distance-based first-order randomized smoothing, adaptive sampling, and equivalent gradient transport for robust and informative gradient computation. Experiments on complex meshes from DexGraspNet and Objaverse show significant improvements over existing baselines. Finally, we demonstrate a direct application of our method for dexterous grasp synthesis to refine the grasp quality. The code is available at https://github.com/JYChen18/DiffCollision.

Paper Structure

This paper contains 29 sections, 1 theorem, 25 equations, 11 figures, 4 tables.

Key Result

Lemma 1

For any $T_1,T_2\in \mathrm{SE}(3)$, $\xi_1\in\mathfrak{se}(3)$, and $\lambda\in\mathbb{R}$, updating $T_1$ with $-\lambda\xi_1$ is equivalent, in terms of the relative pose, to updating $T_2$ with $-\lambda\tilde{\xi}_2$, where Formally, the equivalent relative pose is given by

Figures (11)

  • Figure 1: Task illustration. To verify our derivative of witness points (in red) with respect to the object poses, we optimize the 6D pose of one object (in orange) so that the specified target points (in green) coincide with the witness points.
  • Figure 2: Motivation. When optimizing the pose $T_2$ of object $O_2$, treating the witness points $\mathbf{x}_1, \mathbf{x}_2$ as fixed on objects $O_1$ and $O_2$ without considering their derivatives cannot handle at least two cases: (1) enforcing a specified point $\mathbf{t}_2$ to be in contact (i.e., become a witness point); (2) enforcing object $O_2$ to produce a target contact force $\mathbf{f}_{\text{target}}$ on a concave object $O_1$, where ignoring $\partial \mathbf{x}_1 / \partial T_2$ actually assumes that $\mathbf{x}_1$ is fixed. In this situation, to generate a leftward contact force, one would move $\mathbf{x}_2$ rightward, which is incorrect. By contrast, our method accounts for $\partial \mathbf{x}_1 / \partial T_2$, recognizing that when $\mathbf{x}_2$ moves rightward, $\mathbf{x}_1$ also shifts rightward even more. Thus, the correct update is to move $O_2$ leftward.
  • Figure 3: (1) Task formulation. The witness points $\mathbf{x}_1,\mathbf{x}_2$ calculated by collision detection are expected to match specified target points $\mathbf{t}_1,\mathbf{t}_2$ via the losses shown by dotted lines. (2) Smoothing witness points. Adaptive sampling yields better surface samples (triangles) than fixed sampling, improving the approximation of $\mathbf{x}_1,\mathbf{x}_2$.
  • Figure 4: Equivalent gradient transport (EG). Updating $T_1$ with gradient $\xi_1$ produces the same relative pose as updating $T_2$ with our proposed equivalent gradient$\tilde{\xi}_2$. The object and witness points before the update are shown as dotted lines.
  • Figure 5: Qualitative comparison on convex objects. Different baselines exhibit distinct failure patterns: Analytical often gets stuck due to zero derivatives at vertices (cols. 2, 5); RS-1-Dir struggles to disambiguate vertices lying near a plane (col. 6); RS-0 and Finite Difference (FD) often fail to resolve initial penetrations. In contrast, Ours performs well in both scenarios.
  • ...and 6 more figures

Theorems & Definitions (2)

  • Lemma 1
  • proof