3DRot: Rediscovering the Missing Primitive for RGB-Based 3D Augmentation

Shitian Yang; Deyu Li; Xiaoke Jiang; Lei Zhang

3DRot: Rediscovering the Missing Primitive for RGB-Based 3D Augmentation

Shitian Yang, Deyu Li, Xiaoke Jiang, Lei Zhang

TL;DR

3DRot tackles the scarcity of robust RGB-based 3D augmentations by introducing a depth-free, geometry-faithful rotation about the camera's optical center. It derives a closed-form projective mapping, $H_A = K_A R_{AB} K_B^{-1}$, to warp RGB images while consistently updating intrinsics and 3D annotations, enabling depth-free preservation of 2D–3D relationships. The method yields consistent improvements across monocular 3D detection, monocular depth estimation, and LiDAR+RGB 3D detection on SUN RGB-D, NYU Depth v2, KITTI, and cross-domain splits, demonstrating its generality and practicality. As a simple plug-and-play primitive, 3DRot enhances data diversity without scene reconstruction, potentially boosting robustness to viewpoint changes in real-world 3D perception systems. The work highlights a practical path toward richer RGB-based 3D augmentation and cross-modal consistency in multi-sensor setups.

Abstract

RGB-based 3D tasks, e.g., 3D detection, depth estimation, 3D keypoint estimation, still suffer from scarce, expensive annotations and a thin augmentation toolbox, since many image transforms, including rotations and warps, disrupt geometric consistency. While horizontal flipping and color jitter are standard, rigorous 3D rotation augmentation has surprisingly remained absent from RGB-based pipelines, largely due to the misconception that it requires scene depth or scene reconstruction. In this paper, we introduce 3DRot, a plug-and-play augmentation that rotates and mirrors images about the camera's optical center while synchronously updating RGB images, camera intrinsics, object poses, and 3D annotations to preserve projective geometry, achieving geometry-consistent rotations and reflections without relying on any scene depth. We first validate 3DRot on a classical RGB-based 3D task, monocular 3D detection. On SUN RGB-D, inserting 3DRot into a frozen DINO-X + Cube R-CNN pipeline raises $IoU_{3D}$ from 43.21 to 44.51, cuts rotation error (ROT) from 22.91$^\circ$ to 20.93$^\circ$, and boosts $mAP_{0.5}$ from 35.70 to 38.11; smaller but consistent gains appear on a cross-domain IN10 split. Beyond monocular detection, adding 3DRot on top of the standard BTS augmentation schedule further improves NYU Depth v2 from 0.1783 to 0.1685 in abs-rel (and 0.7472 to 0.7548 in $δ<1.25$), and reduces cross-dataset error on SUN RGB-D. On KITTI, applying the same camera-centric rotations in MVX-Net (LiDAR+RGB) raises moderate 3D AP from about 63.85 to 65.16 while remaining compatible with standard 3D augmentations.

3DRot: Rediscovering the Missing Primitive for RGB-Based 3D Augmentation

TL;DR

3DRot tackles the scarcity of robust RGB-based 3D augmentations by introducing a depth-free, geometry-faithful rotation about the camera's optical center. It derives a closed-form projective mapping,

, to warp RGB images while consistently updating intrinsics and 3D annotations, enabling depth-free preservation of 2D–3D relationships. The method yields consistent improvements across monocular 3D detection, monocular depth estimation, and LiDAR+RGB 3D detection on SUN RGB-D, NYU Depth v2, KITTI, and cross-domain splits, demonstrating its generality and practicality. As a simple plug-and-play primitive, 3DRot enhances data diversity without scene reconstruction, potentially boosting robustness to viewpoint changes in real-world 3D perception systems. The work highlights a practical path toward richer RGB-based 3D augmentation and cross-modal consistency in multi-sensor setups.

Abstract

from 43.21 to 44.51, cuts rotation error (ROT) from 22.91

to 20.93

, and boosts

from 35.70 to 38.11; smaller but consistent gains appear on a cross-domain IN10 split. Beyond monocular detection, adding 3DRot on top of the standard BTS augmentation schedule further improves NYU Depth v2 from 0.1783 to 0.1685 in abs-rel (and 0.7472 to 0.7548 in

), and reduces cross-dataset error on SUN RGB-D. On KITTI, applying the same camera-centric rotations in MVX-Net (LiDAR+RGB) raises moderate 3D AP from about 63.85 to 65.16 while remaining compatible with standard 3D augmentations.

3DRot: Rediscovering the Missing Primitive for RGB-Based 3D Augmentation

TL;DR

Abstract

3DRot: Rediscovering the Missing Primitive for RGB-Based 3D Augmentation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)