Table of Contents
Fetching ...

Lidar Panoptic Segmentation in an Open World

Anirudh S Chakravarthy, Meghana Reddy Ganesina, Peiyun Hu, Laura Leal-Taixe, Shu Kong, Deva Ramanan, Aljosa Osep

TL;DR

This work extends lidar panoptic segmentation to open-world scenarios by introducing LiPSOW, which broadens the vocabulary with a catch-all unknown class and evaluates generalization from SemanticKITTI to KITTI360. It proposes OWL, a two-stage method that combines a semantic backbone trained for $K{+}1$ classes with a class-agnostic, hierarchical segmentation tree and a learned objectness scorer to produce a globally optimal, unique point-to-instance assignment via a min-cut. OWL achieves strong performance on known classes and substantially better unknown object segmentation than baselines, particularly when rare classes are grouped into the unknown category, indicating improved cross-domain generalization. The work also provides an evaluation protocol, extensive analyses, and qualitative results, and releases code to foster future research in continual learning for unknown objects in lidar data.

Abstract

Addressing Lidar Panoptic Segmentation (LPS ) is crucial for safe deployment of autonomous vehicles. LPS aims to recognize and segment lidar points w.r.t. a pre-defined vocabulary of semantic classes, including thing classes of countable objects (e.g., pedestrians and vehicles) and stuff classes of amorphous regions (e.g., vegetation and road). Importantly, LPS requires segmenting individual thing instances (e.g., every single vehicle). Current LPS methods make an unrealistic assumption that the semantic class vocabulary is fixed in the real open world, but in fact, class ontologies usually evolve over time as robots encounter instances of novel classes that are considered to be unknowns w.r.t. the pre-defined class vocabulary. To address this unrealistic assumption, we study LPS in the Open World (LiPSOW): we train models on a dataset with a pre-defined semantic class vocabulary and study their generalization to a larger dataset where novel instances of thing and stuff classes can appear. This experimental setting leads to interesting conclusions. While prior art train class-specific instance segmentation methods and obtain state-of-the-art results on known classes, methods based on class-agnostic bottom-up grouping perform favorably on classes outside of the initial class vocabulary (i.e., unknown classes). Unfortunately, these methods do not perform on-par with fully data-driven methods on known classes. Our work suggests a middle ground: we perform class-agnostic point clustering and over-segment the input cloud in a hierarchical fashion, followed by binary point segment classification, akin to Region Proposal Network [1]. We obtain the final point cloud segmentation by computing a cut in the weighted hierarchical tree of point segments, independently of semantic classification. Remarkably, this unified approach leads to strong performance on both known and unknown classes.

Lidar Panoptic Segmentation in an Open World

TL;DR

This work extends lidar panoptic segmentation to open-world scenarios by introducing LiPSOW, which broadens the vocabulary with a catch-all unknown class and evaluates generalization from SemanticKITTI to KITTI360. It proposes OWL, a two-stage method that combines a semantic backbone trained for classes with a class-agnostic, hierarchical segmentation tree and a learned objectness scorer to produce a globally optimal, unique point-to-instance assignment via a min-cut. OWL achieves strong performance on known classes and substantially better unknown object segmentation than baselines, particularly when rare classes are grouped into the unknown category, indicating improved cross-domain generalization. The work also provides an evaluation protocol, extensive analyses, and qualitative results, and releases code to foster future research in continual learning for unknown objects in lidar data.

Abstract

Addressing Lidar Panoptic Segmentation (LPS ) is crucial for safe deployment of autonomous vehicles. LPS aims to recognize and segment lidar points w.r.t. a pre-defined vocabulary of semantic classes, including thing classes of countable objects (e.g., pedestrians and vehicles) and stuff classes of amorphous regions (e.g., vegetation and road). Importantly, LPS requires segmenting individual thing instances (e.g., every single vehicle). Current LPS methods make an unrealistic assumption that the semantic class vocabulary is fixed in the real open world, but in fact, class ontologies usually evolve over time as robots encounter instances of novel classes that are considered to be unknowns w.r.t. the pre-defined class vocabulary. To address this unrealistic assumption, we study LPS in the Open World (LiPSOW): we train models on a dataset with a pre-defined semantic class vocabulary and study their generalization to a larger dataset where novel instances of thing and stuff classes can appear. This experimental setting leads to interesting conclusions. While prior art train class-specific instance segmentation methods and obtain state-of-the-art results on known classes, methods based on class-agnostic bottom-up grouping perform favorably on classes outside of the initial class vocabulary (i.e., unknown classes). Unfortunately, these methods do not perform on-par with fully data-driven methods on known classes. Our work suggests a middle ground: we perform class-agnostic point clustering and over-segment the input cloud in a hierarchical fashion, followed by binary point segment classification, akin to Region Proposal Network [1]. We obtain the final point cloud segmentation by computing a cut in the weighted hierarchical tree of point segments, independently of semantic classification. Remarkably, this unified approach leads to strong performance on both known and unknown classes.
Paper Structure (26 sections, 7 figures, 5 tables, 1 algorithm)

This paper contains 26 sections, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: We study Lidar Panoptic Segmentation (LPS) in an Open World (LiPSOW). In each of the 2$\times$3 subfigure, left panel visualizes segmented points colored different w.r.t semantic classes, where red encodes unknown; right panel visualizes segmented thing instances of known and unknown. For autonomous navigation, one should evaluate LPS methods in the presence of novelthing object instances and stuff regions, which are usually termed as unknown. We call this setting LiPSOW setting, where methods should particularly segment points into unknown object instances that are outside of the $K$-way semantic classes in the predefined vocabulary. For example, given the predefined vocabulary by SemanticKITTI Behley19ICCV, the unknown objects can be trailers, containers, signaling structures, highway bridge foundations, and buses, as visualized in this figure.
  • Figure 2: Open-World Lidar Panoptic Segmentation (OWL): We first perform $K+1$ semantic segmentation network on a point cloud (a--b) and classify points as stuff, things, and unknown (point color encodes semantic classes, red points represent unknown's). Then we construct a hierarchical tree of "all possible" segments for thing and unknown points (c) and train a segment-scoring function to cut the tree (d), finally producing instance and semantic segmentation results (e).
  • Figure 3: We base LiPSOW setup on SemanticKITTI Behley19ICCVBehley21icra and KITTI360 Liao2021ARXIV datasets. We train and validate models using SemanticKITTI, and re-purpose KITTI360 dataset, which contains classes, not labeled in SemanticKITTI (i.e., unknown) as a test set. A detailed list of our taxonomy is provided in the appendix (Table \ref{['tab:sup:task_split']}).
  • Figure 4: The extended confusion matrix for OWL trained on SemanticKITTI and evaluated in-domain (on SemanticKITTI), using Vocabulary 1. On the left side, we see the confusion among known classes. On the right, we can see which known classes are confused with classes that form the other class. For known classes, we observe a confusion between (related) terrain and vegetation. We also observe that several other points are misclassified as known. Class other-vehicle is often misclassified as car or truck, while other-ground and parking are commonly misclassified as sidewalk and road classes. This explains the low IoU, observed in Table 2 (main paper) on other in SemanticKITTI.
  • Figure 5: The extended confusion matrix for OWL trained on SemanticKITTI and evaluated in cross-domain setting (on KITTI-360), using Vocabulary 1. On the left side, we see the confusion among known classes. On the right, we can see which known classes are confused with classes that form the other class. Contrary to the in-domain confusion, we observe more confusion within known classes. For instance, car and truck classes are often confused. The class sidewalk is often misclassified as terrain, while almost all known classes are confused with vegetation. As can be seen, there is confusion between known and unknown classes. Ground and parking are often predicted as road and sidewalk. Class wall (a novel other-stuff class) is confused with fence, building, and vegetation, presumably due to their geometric similarity. Class trailer is frequently confused with class car. As demonstrated, cross-domain semantic segmentation is a challenging problem.
  • ...and 2 more figures