Table of Contents
Fetching ...

Image Segmentation via Divisive Normalization: dealing with environmental diversity

Pablo Hernández-Cámara, Jorge Vila-Tomás, Paula Dauden-Oliver, Nuria Alabau-Bosque, Valero Laparra, Jesús Malo

TL;DR

Improvements in segmentation performance are explained by quantifying the invariance of the responses that incorporate Divisive Normalization, and by illustrating the adaptive nonlinearity of the different layers that depends on the local activity.

Abstract

Autonomous driving is a challenging scenario for image segmentation due to the presence of uncontrolled environmental conditions and the eventually catastrophic consequences of failures. Previous work suggested that a biologically motivated computation, the so-called Divisive Normalization, could be useful to deal with image variability, but its effects have not been systematically studied over different data sources and environmental factors. Here we put segmentation U-nets augmented with Divisive Normalization to work far from training conditions to find where this adaptation is more critical. We categorize the scenes according to their radiance level and dynamic range (day/night), and according to their achromatic/chromatic contrasts. We also consider video game (synthetic) images to broaden the range of environments. We check the performance in the extreme percentiles of such categorization. Then, we push the limits further by artificially modifying the images in perceptually/environmentally relevant dimensions: luminance, contrasts and spectral radiance. Results show that neural networks with Divisive Normalization get better results in all the scenarios and their performance remains more stable with regard to the considered environmental factors and nature of the source. Finally, we explain the improvements in segmentation performance in two ways: (1) by quantifying the invariance of the responses that incorporate Divisive Normalization, and (2) by illustrating the adaptive nonlinearity of the different layers that depends on the local activity.

Image Segmentation via Divisive Normalization: dealing with environmental diversity

TL;DR

Improvements in segmentation performance are explained by quantifying the invariance of the responses that incorporate Divisive Normalization, and by illustrating the adaptive nonlinearity of the different layers that depends on the local activity.

Abstract

Autonomous driving is a challenging scenario for image segmentation due to the presence of uncontrolled environmental conditions and the eventually catastrophic consequences of failures. Previous work suggested that a biologically motivated computation, the so-called Divisive Normalization, could be useful to deal with image variability, but its effects have not been systematically studied over different data sources and environmental factors. Here we put segmentation U-nets augmented with Divisive Normalization to work far from training conditions to find where this adaptation is more critical. We categorize the scenes according to their radiance level and dynamic range (day/night), and according to their achromatic/chromatic contrasts. We also consider video game (synthetic) images to broaden the range of environments. We check the performance in the extreme percentiles of such categorization. Then, we push the limits further by artificially modifying the images in perceptually/environmentally relevant dimensions: luminance, contrasts and spectral radiance. Results show that neural networks with Divisive Normalization get better results in all the scenarios and their performance remains more stable with regard to the considered environmental factors and nature of the source. Finally, we explain the improvements in segmentation performance in two ways: (1) by quantifying the invariance of the responses that incorporate Divisive Normalization, and (2) by illustrating the adaptive nonlinearity of the different layers that depends on the local activity.
Paper Structure (23 sections, 2 equations, 15 figures, 7 tables)

This paper contains 23 sections, 2 equations, 15 figures, 7 tables.

Figures (15)

  • Figure 1: Motivation: colors and the energy of visual textures change with the environment and data nature and highly affect the segmentation results. Histograms of luminances (left histogram column), achromatic contrast (middle histogram column) and chromatic contrasts (right histogram columns) for 7 different datasets (rows). Vertical orange lines in the histograms represent their median values. The definition of contrasts (energy of spatial modulation of luminance and color) is described in Section \ref{['results_global_extreme']}. Last two columns show an example of each dataset and the segmentation result of a U-Net model ronneberger2015u trained with natural daytime and clean images. The first row shows the distributions corresponding to the natural, daytime scenes from Cityscapes Cordts2016Cityscapes. 2nd to 4th rows correspond to the same scenes modified to include different fog levels fog_cityscapes. 5th row corresponds to real urban night images daytime_2_nighttime. 6th row corresponds to scenes from the famous video game GTA-V gta and last row corresponds to computer-generated scenes using the virtual-reality framework CARLA Carla_dataset.
  • Figure 2: U-Net for segmentation using four Div. Norm. layers (4-DN). Divisive Normalization layers are indicated in green. Numbers above the layers indicate the number of features in each block and black arrows represent the skip unions. The no-DN model does not have any of the green layers, i.e. it does not have Divisive Normalization layers. The consideration of the DN layers only increases the number of parameters by 1.8% with regard to the no-DN model. Image from HERNANDEZCAMARA202364 reproduced with author permission.
  • Figure 3: Example images and segmentation ground truth of the datasets. From left to right an image of Cityscapes, Nighttime Driving, CARLA Simulator and GTA-V.
  • Figure 4: Example of fog severities. An image from Cityscapes and their corresponding versions in Foggy Cityscapes with different severities.
  • Figure 5: Representative extreme images of the dataset partitions. From left to right it shows the images with the lowest and higher mean luminance, achromatic contrast and chromatic contrast.
  • ...and 10 more figures