Table of Contents
Fetching ...

Ensembling convolutional neural networks for human skin segmentation

Patryk Kuban, Michal Kawulok

TL;DR

This work tackles human skin segmentation by fusing color and grayscale information through a two-level CNN ensemble. It trains multiple base Skinny CNNs on diverse feature streams, including color, grayscale, and BC-based region stratifications, and uses a second-level CNN to integrate their outputs into a final segmentation map. On the ECU skin dataset, the proposed Ensemble-S achieves the best F-score and outperforms single models and voting ensembles, demonstrating that heterogeneous feature fusion yields robust segmentation. The approach highlights the value of sequential fusion in semantic segmentation and paves the way for multi-level ensembles and broader applications beyond skin detection.

Abstract

Detecting and segmenting human skin regions in digital images is an intensively explored topic of computer vision with a variety of approaches proposed over the years that have been found useful in numerous practical applications. The first methods were based on pixel-wise skin color modeling and they were later enhanced with context-based analysis to include the textural and geometrical features, recently extracted using deep convolutional neural networks. It has been also demonstrated that skin regions can be segmented from grayscale images without using color information at all. However, the possibility to combine these two sources of information has not been explored so far and we address this research gap with the contribution reported in this paper. We propose to train a convolutional network using the datasets focused on different features to create an ensemble whose individual outcomes are effectively combined using yet another convolutional network trained to produce the final segmentation map. The experimental results clearly indicate that the proposed approach outperforms the basic classifiers, as well as an ensemble based on the voting scheme. We expect that this study will help in developing new ensemble-based techniques that will improve the performance of semantic segmentation systems, reaching beyond the problem of detecting human skin.

Ensembling convolutional neural networks for human skin segmentation

TL;DR

This work tackles human skin segmentation by fusing color and grayscale information through a two-level CNN ensemble. It trains multiple base Skinny CNNs on diverse feature streams, including color, grayscale, and BC-based region stratifications, and uses a second-level CNN to integrate their outputs into a final segmentation map. On the ECU skin dataset, the proposed Ensemble-S achieves the best F-score and outperforms single models and voting ensembles, demonstrating that heterogeneous feature fusion yields robust segmentation. The approach highlights the value of sequential fusion in semantic segmentation and paves the way for multi-level ensembles and broader applications beyond skin detection.

Abstract

Detecting and segmenting human skin regions in digital images is an intensively explored topic of computer vision with a variety of approaches proposed over the years that have been found useful in numerous practical applications. The first methods were based on pixel-wise skin color modeling and they were later enhanced with context-based analysis to include the textural and geometrical features, recently extracted using deep convolutional neural networks. It has been also demonstrated that skin regions can be segmented from grayscale images without using color information at all. However, the possibility to combine these two sources of information has not been explored so far and we address this research gap with the contribution reported in this paper. We propose to train a convolutional network using the datasets focused on different features to create an ensemble whose individual outcomes are effectively combined using yet another convolutional network trained to produce the final segmentation map. The experimental results clearly indicate that the proposed approach outperforms the basic classifiers, as well as an ensemble based on the voting scheme. We expect that this study will help in developing new ensemble-based techniques that will improve the performance of semantic segmentation systems, reaching beyond the problem of detecting human skin.
Paper Structure (8 sections, 6 figures, 3 tables)

This paper contains 8 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Architecture of the exploited Skinny network for human skin segmentation.
  • Figure 2: The BC-based skin segmentation employed to split the input images into skin and non-skin regions, from which two different Skinny models are trained.
  • Figure 3: A flowchart presenting skin segmentation using Ensemble-S. The probability maps retrieved with three Skinny models trained from the grayscale image and from two exclusive regions in the color image, indicated as skin and non-skin by the BC, are aggregated using the second-level Skinny model to obtain the final segmentation outcome.
  • Figure 4: Precision-recall curves for the selected base models and ensemble classifiers for the ECU test set.
  • Figure 5: Examples of skin segmentation performed using different techniques. False positives are indicated with red color and false negatives with blue color, and F-score is provided under each outcome (the best score for each example is boldfaced).
  • ...and 1 more figures