Table of Contents
Fetching ...

OccTrack360: 4D Panoptic Occupancy Tracking from Surround-View Fisheye Cameras

Yongzhi Lin, Kai Luo, Yuanfan Zheng, Hao Shi, Mengfei Duan, Yang Liu, Kailun Yang

TL;DR

OccTrack360 is presented, a new benchmark for 4D panoptic occupancy tracking from surround-view fisheye cameras that improves occupancy tracking quality with notable gains on geometrically regular categories, and establishes a strong baseline for future research on surround-view fisheye 4D occupancy tracking.

Abstract

Understanding dynamic 3D environments in a spatially continuous and temporally consistent manner is fundamental for robotics and autonomous driving. While recent advances in occupancy prediction provide a unified representation of scene geometry and semantics, progress in 4D panoptic occupancy tracking remains limited by the lack of benchmarks that support surround-view fisheye sensing, long temporal sequences, and instance-level voxel tracking. To address this gap, we present OccTrack360, a new benchmark for 4D panoptic occupancy tracking from surround-view fisheye cameras. OccTrack360 provides substantially longer and more diverse sequences (174~2234 frames) than prior benchmarks, together with principled voxel visibility annotations, including an all-direction occlusion mask and an MEI-based fisheye field-of-view mask. To establish a strong fisheye-oriented baseline, we further propose Focus on Sphere Occ (FoSOcc), a framework that addresses two core challenges in fisheye occupancy tracking: distorted spherical projection and inaccurate voxel-space localization. FoSOcc includes a Center Focusing Module (CFM) to enhance instance-aware spatial localization through supervised focus guidance, and a Spherical Lift Module (SLM) that extends perspective lifting to fisheye imaging under the Unified Projection Model. Extensive experiments on Occ3D-Waymo and OccTrack360 show that our method improves occupancy tracking quality with notable gains on geometrically regular categories, and establishes a strong baseline for future research on surround-view fisheye 4D occupancy tracking. The benchmark and source code will be made publicly available at https://github.com/YouthZest-Lin/OccTrack360.

OccTrack360: 4D Panoptic Occupancy Tracking from Surround-View Fisheye Cameras

TL;DR

OccTrack360 is presented, a new benchmark for 4D panoptic occupancy tracking from surround-view fisheye cameras that improves occupancy tracking quality with notable gains on geometrically regular categories, and establishes a strong baseline for future research on surround-view fisheye 4D occupancy tracking.

Abstract

Understanding dynamic 3D environments in a spatially continuous and temporally consistent manner is fundamental for robotics and autonomous driving. While recent advances in occupancy prediction provide a unified representation of scene geometry and semantics, progress in 4D panoptic occupancy tracking remains limited by the lack of benchmarks that support surround-view fisheye sensing, long temporal sequences, and instance-level voxel tracking. To address this gap, we present OccTrack360, a new benchmark for 4D panoptic occupancy tracking from surround-view fisheye cameras. OccTrack360 provides substantially longer and more diverse sequences (174~2234 frames) than prior benchmarks, together with principled voxel visibility annotations, including an all-direction occlusion mask and an MEI-based fisheye field-of-view mask. To establish a strong fisheye-oriented baseline, we further propose Focus on Sphere Occ (FoSOcc), a framework that addresses two core challenges in fisheye occupancy tracking: distorted spherical projection and inaccurate voxel-space localization. FoSOcc includes a Center Focusing Module (CFM) to enhance instance-aware spatial localization through supervised focus guidance, and a Spherical Lift Module (SLM) that extends perspective lifting to fisheye imaging under the Unified Projection Model. Extensive experiments on Occ3D-Waymo and OccTrack360 show that our method improves occupancy tracking quality with notable gains on geometrically regular categories, and establishes a strong baseline for future research on surround-view fisheye 4D occupancy tracking. The benchmark and source code will be made publicly available at https://github.com/YouthZest-Lin/OccTrack360.
Paper Structure (16 sections, 11 equations, 10 figures, 3 tables)

This paper contains 16 sections, 11 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Illustration of 4D panoptic occupancy tracking. Objects highlighted within the blue rectangles are the same objects. Our FoSOcc leverages fisheye images as input to perform comprehensive 4D panoptic occupancy tracking.
  • Figure 2: Dataset comparison for occupancy and fisheye perception.
  • Figure 3: Distribution of $18$ semantic classes.
  • Figure 5: Benchmark pipeline. Our benchmark integrates multiple inputs to derive the FoV mask, occlusion mask, and 4D centered occupancy labels. These masks are subsequently combined to filter the visible regions, yielding the final representation used for training and inference. "Gen" denotes generation.
  • Figure 6: Illustration of $z^+$ error. The two frames shown above are both from sequence Seq00. The left and right subplots depict the $75\textsuperscript{th}$ and $100\textsuperscript{th}$ frames of SSCBench-KITTI360 li2024sscbench, respectively, corresponding to the $167\textsuperscript{th}$ and $192\textsuperscript{th}$ frames of the KITTI360 dataset Liao2021ARXIV. The value $z^w$ represents the height translation extracted from the extrinsic parameters provided by KITTI360 Liao2021ARXIV. At $75\textsuperscript{th}$ frame, the ground-truth voxel labels indicate the onset of an uphill segment. By $100\textsuperscript{th}$ frame, the ego-vehicle has already ascended the slope, yet the corresponding $z^w$ value decreases.
  • ...and 5 more figures