Table of Contents
Fetching ...

Occlusion-Aware Deep Convolutional Neural Network via Homogeneous Tanh-transforms for Face Parsing

Jianhua Qiua, Weihua Liu, Chaochao Lin, Jiaojiao Li, Haoping Yu, Said Boumaraf

TL;DR

Occluded face parsing is challenging because occlusions distort both local features and surrounding context. The authors propose FTNet, a four-point transform network that warps the face RoI into four Tanh-polar views anchored at the bounding box corners, enabling rotation- and translation-aware feature fusion via Four-point Blocks and an occlusion-aware loss. They also introduce the Sheltered Face Parsing Dataset (SFPD) to benchmark occlusion-robust parsing and to facilitate future research. Across extensive experiments, FTNet consistently outperforms state-of-the-art methods on both occluded and non-occluded faces, demonstrating strong robustness to a range of occlusions and improved boundary delineation, with practical implications for real-world face analysis under occlusion.

Abstract

Face parsing infers a pixel-wise label map for each semantic facial component. Previous methods generally work well for uncovered faces, however, they overlook facial occlusion and ignore some contextual areas outside a single face, especially when facial occlusion has become a common situation during the COVID-19 epidemic. Inspired by the lighting phenomena in everyday life, where illumination from four distinct lamps provides a more uniform distribution than a single central light source, we propose a novel homogeneous tanh-transform for image preprocessing, which is made up of four tanh-transforms. These transforms fuse the central vision and the peripheral vision together. Our proposed method addresses the dilemma of face parsing under occlusion and compresses more information from the surrounding context. Based on homogeneous tanh-transforms, we propose an occlusion-aware convolutional neural network for occluded face parsing. It combines information in both Tanh-polar space and Tanh-Cartesian space, capable of enhancing receptive fields. Furthermore, we introduce an occlusion-aware loss to focus on the boundaries of occluded regions. The network is simple, flexible, and can be trained end-to-end. To facilitate future research of occluded face parsing, we also contribute a new cleaned face parsing dataset. This dataset is manually purified from several academic or industrial datasets, including CelebAMask-HQ, Short-video Face Parsing, and the Helen dataset, and will be made public. Experiments demonstrate that our method surpasses state-of-the-art methods in face parsing under occlusion.

Occlusion-Aware Deep Convolutional Neural Network via Homogeneous Tanh-transforms for Face Parsing

TL;DR

Occluded face parsing is challenging because occlusions distort both local features and surrounding context. The authors propose FTNet, a four-point transform network that warps the face RoI into four Tanh-polar views anchored at the bounding box corners, enabling rotation- and translation-aware feature fusion via Four-point Blocks and an occlusion-aware loss. They also introduce the Sheltered Face Parsing Dataset (SFPD) to benchmark occlusion-robust parsing and to facilitate future research. Across extensive experiments, FTNet consistently outperforms state-of-the-art methods on both occluded and non-occluded faces, demonstrating strong robustness to a range of occlusions and improved boundary delineation, with practical implications for real-world face analysis under occlusion.

Abstract

Face parsing infers a pixel-wise label map for each semantic facial component. Previous methods generally work well for uncovered faces, however, they overlook facial occlusion and ignore some contextual areas outside a single face, especially when facial occlusion has become a common situation during the COVID-19 epidemic. Inspired by the lighting phenomena in everyday life, where illumination from four distinct lamps provides a more uniform distribution than a single central light source, we propose a novel homogeneous tanh-transform for image preprocessing, which is made up of four tanh-transforms. These transforms fuse the central vision and the peripheral vision together. Our proposed method addresses the dilemma of face parsing under occlusion and compresses more information from the surrounding context. Based on homogeneous tanh-transforms, we propose an occlusion-aware convolutional neural network for occluded face parsing. It combines information in both Tanh-polar space and Tanh-Cartesian space, capable of enhancing receptive fields. Furthermore, we introduce an occlusion-aware loss to focus on the boundaries of occluded regions. The network is simple, flexible, and can be trained end-to-end. To facilitate future research of occluded face parsing, we also contribute a new cleaned face parsing dataset. This dataset is manually purified from several academic or industrial datasets, including CelebAMask-HQ, Short-video Face Parsing, and the Helen dataset, and will be made public. Experiments demonstrate that our method surpasses state-of-the-art methods in face parsing under occlusion.
Paper Structure (19 sections, 10 equations, 9 figures, 5 tables)

This paper contains 19 sections, 10 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Examples from our Sheltered Face Parsing Dataset and the color-coded labels predicted by our proposed method, which performs precise parsing of each facial component when some objects cover the face.
  • Figure 2: The overview of proposed FTNet structure. Given an input image and bounding box (in green), the four-point transform is applied to convert the RoI (in red) of input image to a new representation. The warped origin of input image (in white light) is defined as the corner of bounding box. The proposed occlusion-aware CNN is mainly composed of multiple Four-point Block (FPB) detailed in \ref{['subsec:feature']} to extract feature from the warped image. An occlusion-aware loss is designed to guide the warped parsing label prediction. Finally, the warped label is restored to the regular coordinate system.
  • Figure 3: The homogeneous Tanh-transforms pipeline, namely, four-point transform. (1) Input image, bounding box and the RoI area (2) Four warping regions according to four corners of bounding box (3) An example of the homogeneous Tanh-transforms in one warping area (4) The image in Tanh-Cartesian coordinate system and warping output in Tanh-polar coordinate system. Note that semi-transparent area which is out of RoI area do not participate in the calculation.
  • Figure 4: Comparison of original image and restored image. Observing that outside the RoI area, some information losses.
  • Figure 5: Qualitative comparisons with state-of-the-art methods.
  • ...and 4 more figures