Random Walk on Pixel Manifolds for Anomaly Segmentation of Complex Driving Scenes

Zelong Zeng; Kaname Tomite

Random Walk on Pixel Manifolds for Anomaly Segmentation of Complex Driving Scenes

Zelong Zeng, Kaname Tomite

Abstract

In anomaly segmentation for complex driving scenes, state-of-the-art approaches utilize anomaly scoring functions to calculate anomaly scores. For these functions, accurately predicting the logits of inlier classes for each pixel is crucial for precisely inferring the anomaly score. However, in real-world driving scenarios, the diversity of scenes often results in distorted manifolds of pixel embeddings in the space. This effect is not conducive to directly using the pixel embeddings for the logit prediction during inference, a concern overlooked by existing methods. To address this problem, we propose a novel method called Random Walk on Pixel Manifolds (RWPM). RWPM utilizes random walks to reveal the intrinsic relationships among pixels to refine the pixel embeddings. The refined pixel embeddings alleviate the distortion of manifolds, improving the accuracy of anomaly scores. Our extensive experiments show that RWPM consistently improve the performance of the existing anomaly segmentation methods and achieve the best results. Code is available at: \url{https://github.com/ZelongZeng/RWPM}.

Random Walk on Pixel Manifolds for Anomaly Segmentation of Complex Driving Scenes

Abstract

Paper Structure (22 sections, 10 equations, 6 figures, 10 tables)

This paper contains 22 sections, 10 equations, 6 figures, 10 tables.

Introduction
Related Work
Proposed Method
Preliminaries
Graph Construction
Random Walk Process
Partial Random Walk
Experiment
Effectiveness of RWPM
Ablation Study
Visualization of Pixel Embedding Distribution
Qualitative Result
Conclusions and Discussions
The Architecture of Pixel-based and Mask-based Networks
Anomaly Scoring Function Details
...and 7 more sections

Figures (6)

Figure 1: A toy example about manifolds of pixel embeddings. The dashed circles indicate regions with high predicted logit to the corresponding inlier class. For the pixel in these regions, its anomaly score is small. The prototype is the vector of network weights of the classifier for the corresponding inlier class. (i) An ideal manifold structure. All inlier pixels are within the corresponding high logit region, whereas all outlier pixels are outside the areas. The anomaly score derived from the anomaly scoring function effectively discriminates between them. (ii) The manifolds are affected by the diversity of data, causing some inlier pixels to deviate from the corresponding high logit region, while some outlier pixels approach the regions. This results in false positives/negatives in the anomaly scores. However, pixels of the same class are still in the same manifold, indicating that the manifold structure can be utilized to reveal the intrinsic relationships between pixels. (iii) Our RWPM utilizes random walks to capture the manifold structure to diffuse and update pixel embeddings. The embeddings within the same manifold tend to become more similar after the updating. Fig \ref{['fig:visualization']} and Fig \ref{['fig:qualitative']} respectively present the visualization of embedding distributions and the qualitative results of real examples, consistent with the description of this toy example.
Figure 1: The architecture of pixel-based and mask-based networks. (a) depicts the architecture of the pixel-based network used in the main paper, namely Deeplabv3+ with WideResNet38. (b) illustrates the architecture of the mask-based network used in the main paper, which is Mask2Former with Swin-L. The red arrows indicate the position of the pixel embeddings map used by RWPM in different network architectures.
Figure 2: Overview. This figure illustrates the application of RWPM to an existing anomaly segmentation framework during the inference phase. The red dashed box highlights the RWPM part. First, the encoder-decoder is used to extract the pixel embedding map of the input image. The pixel embedding map are subsequently partitioned into $n^2$ sub-maps. For each sub-map, we update its pixel embeddings by using random walks to obtain a refined sub-map. Finally, the refined sub-maps are concatenated to form the refined embedding map, which are then input into the subsequent network structure. Notably, RWPM can be directly integrated into existing frameworks without requiring extra training or changes to the network structure.
Figure 2: The process of updating pixel embeddings. (a) shows the distribution of pixel embeddings, all data points belong to the same category. (b) shows a pixel embedding (red circle) that deviates from the clustering and its $5$ neighborhoods (blue circle). The number inside each neighborhood (blue circle) represents its similarity to the pixel embedding (red circle). (c) shows that, after the computation by the weighted ensemble, the updated pixel embedding moves closer to the clustering.
Figure 3: The visualization of pixel embedding distribution. All embeddings are extracted from PEBAL model. We use cosine distance as the distance metric. We observed that RWPM can optimize the distribution of pixel embeddings in the space.
...and 1 more figures

Random Walk on Pixel Manifolds for Anomaly Segmentation of Complex Driving Scenes

Abstract

Random Walk on Pixel Manifolds for Anomaly Segmentation of Complex Driving Scenes

Authors

Abstract

Table of Contents

Figures (6)