Semantic Scene Segmentation for Robotics
Juana Valeria Hurtado, Abhinav Valada
TL;DR
Semantic Scene Segmentation for Robotics surveys the task of dense pixel-wise scene labeling and its central role in robot perception. It systematically reviews traditional and deep learning architectures, with emphasis on encoder–decoder designs, multi-scale context modules such as ASPP/PSPNet, and real-time variants, while also covering multi-input settings including video, point clouds, and multimodal fusion. The chapter catalogs outdoor, indoor, and general-purpose datasets and standard metrics (e.g., $PA$, $IoU$, $mIoU$, PRC-based measures) and discusses computational considerations like runtime and FLOPs. It highlights challenges such as data labeling bottlenecks and the need for real-time, robust models, pointing toward future directions in weakly/self-supervised learning, transfer learning, and holistic scene representations like panoptic segmentation for robotics applications.
Abstract
Comprehensive scene understanding is a critical enabler of robot autonomy. Semantic segmentation is one of the key scene understanding tasks which is pivotal for several robotics applications including autonomous driving, domestic service robotics, last mile delivery, amongst many others. Semantic segmentation is a dense prediction task that aims to provide a scene representation in which each pixel of an image is assigned a semantic class label. Therefore, semantic segmentation considers the full scene context, incorporating the object category, location, and shape of all the scene elements, including the background. Numerous algorithms have been proposed for semantic segmentation over the years. However, the recent advances in deep learning combined with the boost in the computational capacity and the availability of large-scale labeled datasets have led to significant advances in semantic segmentation. In this chapter, we introduce the task of semantic segmentation and present the deep learning techniques that have been proposed to address this task over the years. We first define the task of semantic segmentation and contrast it with other closely related scene understanding problems. We detail different algorithms and architectures for semantic segmentation and the commonly employed loss functions. Furthermore, we present an overview of datasets, benchmarks, and metrics that are used in semantic segmentation. We conclude the chapter with a discussion of challenges and opportunities for further research in this area.
