Self-Supervised Likelihood Estimation with Energy Guidance for Anomaly Segmentation in Urban Scenes
Yuanpeng Tu, Yuxi Li, Boshen Zhang, Liang Liu, Jiangning Zhang, Yabiao Wang, Cai Rong Zhao
TL;DR
Self-supervised anomaly segmentation in urban scenes is addressed by SLEEG, which avoids labeled anomaly data by modeling anomaly likelihood with two energy-guided estimators and refining pseudo anomalies through a context-aware copy-paste scheme. It decouples a task-agnostic detector and a task-oriented residual estimator within a joint-energy framework to maximize likelihoods $p_a(o|z)$ and $p_t(o|z)$, using a dynamic margin to stabilize learning. An adaptive patch refinement mechanism uses anomaly scores $\mathcal{A}(x)$ to produce informative training samples, yielding competitive results on Fishyscapes and Road Anomaly without re-training the segmentation backbone. The work demonstrates that context-aware self-supervision can match supervised anomaly segmentation performance while reducing data labeling and modeling overhead.
Abstract
Robust autonomous driving requires agents to accurately identify unexpected areas (anomalies) in urban scenes. To this end, some critical issues remain open: how to design advisable metric to measure anomalies, and how to properly generate training samples of anomaly data? Classical effort in anomaly detection usually resorts to pixel-wise uncertainty or sample synthesis, which ignores the contextual information and sometimes requires auxiliary data with fine-grained annotations. On the contrary, in this paper, we exploit the strong context-dependent nature of the segmentation task and design an energy-guided self-supervised framework for anomaly segmentation, which optimizes an anomaly head by maximizing the likelihood of self-generated anomaly pixels. For this purpose, we design two estimators to model anomaly likelihood, one is a task-agnostic binary estimator and the other depicts the likelihood as residual of task-oriented joint energy. Based on the proposed estimators, we devise an adaptive self-supervised training framework, which exploits the contextual reliance and estimated likelihood to refine mask annotations in anomaly areas. We conduct extensive experiments on challenging Fishyscapes and Road Anomaly benchmarks, demonstrating that without any auxiliary data or synthetic models, our method can still achieve comparable performance to supervised competitors. Code is available at https://github.com/yuanpengtu/SLEEG..
