Table of Contents
Fetching ...

Self-Supervised Likelihood Estimation with Energy Guidance for Anomaly Segmentation in Urban Scenes

Yuanpeng Tu, Yuxi Li, Boshen Zhang, Liang Liu, Jiangning Zhang, Yabiao Wang, Cai Rong Zhao

TL;DR

Self-supervised anomaly segmentation in urban scenes is addressed by SLEEG, which avoids labeled anomaly data by modeling anomaly likelihood with two energy-guided estimators and refining pseudo anomalies through a context-aware copy-paste scheme. It decouples a task-agnostic detector and a task-oriented residual estimator within a joint-energy framework to maximize likelihoods $p_a(o|z)$ and $p_t(o|z)$, using a dynamic margin to stabilize learning. An adaptive patch refinement mechanism uses anomaly scores $\mathcal{A}(x)$ to produce informative training samples, yielding competitive results on Fishyscapes and Road Anomaly without re-training the segmentation backbone. The work demonstrates that context-aware self-supervision can match supervised anomaly segmentation performance while reducing data labeling and modeling overhead.

Abstract

Robust autonomous driving requires agents to accurately identify unexpected areas (anomalies) in urban scenes. To this end, some critical issues remain open: how to design advisable metric to measure anomalies, and how to properly generate training samples of anomaly data? Classical effort in anomaly detection usually resorts to pixel-wise uncertainty or sample synthesis, which ignores the contextual information and sometimes requires auxiliary data with fine-grained annotations. On the contrary, in this paper, we exploit the strong context-dependent nature of the segmentation task and design an energy-guided self-supervised framework for anomaly segmentation, which optimizes an anomaly head by maximizing the likelihood of self-generated anomaly pixels. For this purpose, we design two estimators to model anomaly likelihood, one is a task-agnostic binary estimator and the other depicts the likelihood as residual of task-oriented joint energy. Based on the proposed estimators, we devise an adaptive self-supervised training framework, which exploits the contextual reliance and estimated likelihood to refine mask annotations in anomaly areas. We conduct extensive experiments on challenging Fishyscapes and Road Anomaly benchmarks, demonstrating that without any auxiliary data or synthetic models, our method can still achieve comparable performance to supervised competitors. Code is available at https://github.com/yuanpengtu/SLEEG..

Self-Supervised Likelihood Estimation with Energy Guidance for Anomaly Segmentation in Urban Scenes

TL;DR

Self-supervised anomaly segmentation in urban scenes is addressed by SLEEG, which avoids labeled anomaly data by modeling anomaly likelihood with two energy-guided estimators and refining pseudo anomalies through a context-aware copy-paste scheme. It decouples a task-agnostic detector and a task-oriented residual estimator within a joint-energy framework to maximize likelihoods and , using a dynamic margin to stabilize learning. An adaptive patch refinement mechanism uses anomaly scores to produce informative training samples, yielding competitive results on Fishyscapes and Road Anomaly without re-training the segmentation backbone. The work demonstrates that context-aware self-supervision can match supervised anomaly segmentation performance while reducing data labeling and modeling overhead.

Abstract

Robust autonomous driving requires agents to accurately identify unexpected areas (anomalies) in urban scenes. To this end, some critical issues remain open: how to design advisable metric to measure anomalies, and how to properly generate training samples of anomaly data? Classical effort in anomaly detection usually resorts to pixel-wise uncertainty or sample synthesis, which ignores the contextual information and sometimes requires auxiliary data with fine-grained annotations. On the contrary, in this paper, we exploit the strong context-dependent nature of the segmentation task and design an energy-guided self-supervised framework for anomaly segmentation, which optimizes an anomaly head by maximizing the likelihood of self-generated anomaly pixels. For this purpose, we design two estimators to model anomaly likelihood, one is a task-agnostic binary estimator and the other depicts the likelihood as residual of task-oriented joint energy. Based on the proposed estimators, we devise an adaptive self-supervised training framework, which exploits the contextual reliance and estimated likelihood to refine mask annotations in anomaly areas. We conduct extensive experiments on challenging Fishyscapes and Road Anomaly benchmarks, demonstrating that without any auxiliary data or synthetic models, our method can still achieve comparable performance to supervised competitors. Code is available at https://github.com/yuanpengtu/SLEEG..
Paper Structure (17 sections, 11 equations, 10 figures, 11 tables)

This paper contains 17 sections, 11 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Illustration of contextual reliance in anomaly segmentation tasks. The left column shows an image pasted with a random patch at different position. The right column illustrates the corresponding entropy distribution from segmentation results of DeepLab segmentation model chen2018encoder. Different pasted positions result in various uncertainty distribution within the patch.
  • Figure 2: Illustration of proposed SLEEG framework, an OoD head is extended and trained in a self-supervised manner to enable a pretrained segmentation model with anomaly detection ability.
  • Figure 3: Visualization of generated patches with random shapes on training set of Cityscapes. Area with red mask denotes anomaly pixels after mask refinement, the blue area represents the ignored pixels from pasted patches.
  • Figure 4: Investigation on the influence on AP and false positive rate with varied patch number $N$ (left) and margin value $\gamma$ (right) on FS Lost & Found validation set.
  • Figure 5: Visualization of on FS Lost & Found validation set. Compared with JEM, predictions from our SLEEG show anomaly maps with higher responses for anomalous instances and lower values for normal pixels.
  • ...and 5 more figures