PanDepth: Joint Panoptic Segmentation and Depth Completion
Juan Lagos, Esa Rahtu
TL;DR
PanDepth addresses the need for holistic 3D scene understanding in autonomous driving by jointly performing panoptic segmentation and depth completion from RGB images and sparse depth. The authors propose an end-to-end architecture with a two-way FPN backbone (EfficientNet-B5), three task-specific branches (semantic, instance, depth), a joint refinement branch, and a panoptic fusion module, trained with a combined loss. On Virtual KITTI 2, PanDepth achieves dense depth and panoptic segmentation while maintaining a modest parameter count and competitive accuracy across tasks, outperforming some baselines in semantic and depth metrics. The work also provides generated panoptic annotations for Virtual KITTI 2 and demonstrates the practical viability of joint learning for integrated 3D scene understanding in driving scenarios.
Abstract
Understanding 3D environments semantically is pivotal in autonomous driving applications where multiple computer vision tasks are involved. Multi-task models provide different types of outputs for a given scene, yielding a more holistic representation while keeping the computational cost low. We propose a multi-task model for panoptic segmentation and depth completion using RGB images and sparse depth maps. Our model successfully predicts fully dense depth maps and performs semantic segmentation, instance segmentation, and panoptic segmentation for every input frame. Extensive experiments were done on the Virtual KITTI 2 dataset and we demonstrate that our model solves multiple tasks, without a significant increase in computational cost, while keeping high accuracy performance. Code is available at https://github.com/juanb09111/PanDepth.git
