Table of Contents
Fetching ...

OccFace: Unified Occlusion-Aware Facial Landmark Detection with Per-Point Visibility

Xinhao Xiang, Zhengxin Li, Saurav Dhakad, Theo Bancroft, Jiawei Zhang, Weiyang Li

TL;DR

OccFace tackles the challenge of facial landmark detection under occlusion for universal human-like faces by predicting 100 dense landmarks along with per-point visibility. It combines a heatmap-based localization backbone with an occlusion module that fuses local evidence and cross-landmark context via a gated mechanism, and it is trained with a landmark-aware masking strategy to generate pseudo visibility signals. The authors introduce Genie-Face, a diverse dataset annotated with 100-point landmarks and per-point visibility across real, rendered, and stylized faces, enabling occlusion-aware evaluation. The method achieves strong robustness to external occlusion and rotation-driven self-occlusion while maintaining accuracy on visible landmarks, and demonstrates practical benefits for downstream tasks like avatar animation. Together, the unified layout, visibility-aware training, and dataset offer a practical framework for reliable landmark reasoning in broad, real-world scenarios where occlusion and pose vary widely.

Abstract

Accurate facial landmark detection under occlusion remains challenging, especially for human-like faces with large appearance variation and rotation-driven self-occlusion. Existing detectors typically localize landmarks while handling occlusion implicitly, without predicting per-point visibility that downstream applications can benefits. We present OccFace, an occlusion-aware framework for universal human-like faces, including humans, stylized characters, and other non-human designs. OccFace adopts a unified dense 100-point layout and a heatmap-based backbone, and adds an occlusion module that jointly predicts landmark coordinates and per-point visibility by combining local evidence with cross-landmark context. Visibility supervision mixes manual labels with landmark-aware masking that derives pseudo visibility from mask-heatmap overlap. We also create an occlusion-aware evaluation suite reporting NME on visible vs. occluded landmarks and benchmarking visibility with Occ AP, F1@0.5, and ROC-AUC, together with a dataset annotated with 100-point landmarks and per-point visibility. Experiments show improved robustness under external occlusion and large head rotations, especially on occluded regions, while preserving accuracy on visible landmarks.

OccFace: Unified Occlusion-Aware Facial Landmark Detection with Per-Point Visibility

TL;DR

OccFace tackles the challenge of facial landmark detection under occlusion for universal human-like faces by predicting 100 dense landmarks along with per-point visibility. It combines a heatmap-based localization backbone with an occlusion module that fuses local evidence and cross-landmark context via a gated mechanism, and it is trained with a landmark-aware masking strategy to generate pseudo visibility signals. The authors introduce Genie-Face, a diverse dataset annotated with 100-point landmarks and per-point visibility across real, rendered, and stylized faces, enabling occlusion-aware evaluation. The method achieves strong robustness to external occlusion and rotation-driven self-occlusion while maintaining accuracy on visible landmarks, and demonstrates practical benefits for downstream tasks like avatar animation. Together, the unified layout, visibility-aware training, and dataset offer a practical framework for reliable landmark reasoning in broad, real-world scenarios where occlusion and pose vary widely.

Abstract

Accurate facial landmark detection under occlusion remains challenging, especially for human-like faces with large appearance variation and rotation-driven self-occlusion. Existing detectors typically localize landmarks while handling occlusion implicitly, without predicting per-point visibility that downstream applications can benefits. We present OccFace, an occlusion-aware framework for universal human-like faces, including humans, stylized characters, and other non-human designs. OccFace adopts a unified dense 100-point layout and a heatmap-based backbone, and adds an occlusion module that jointly predicts landmark coordinates and per-point visibility by combining local evidence with cross-landmark context. Visibility supervision mixes manual labels with landmark-aware masking that derives pseudo visibility from mask-heatmap overlap. We also create an occlusion-aware evaluation suite reporting NME on visible vs. occluded landmarks and benchmarking visibility with Occ AP, F1@0.5, and ROC-AUC, together with a dataset annotated with 100-point landmarks and per-point visibility. Experiments show improved robustness under external occlusion and large head rotations, especially on occluded regions, while preserving accuracy on visible landmarks.
Paper Structure (16 sections, 10 equations, 6 figures, 8 tables)

This paper contains 16 sections, 10 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: (a) We study universal human-like face alignment beyond real humans, covering diverse domains. (b) The setting is harder under occlusions, including both external occludes and rotation-driven self-occlusion under large head pose. (c) To make landmarks consistent across such domains, we introduce a unified 100-point layout that adds expressive structures (e.g., ear contours, inner-mouth, fine-grained eyes). (d) We design OccFace model to keep the strong spatial inductive bias of heatmap localization, while explicitly predicting per-point visibility: auxiliary point/edge evidence maps provide geometric cues that gate heatmap localization and support two visibility cues—local evidence (for external occlusion) and cross-landmark context (for correlated self-occlusion)—combined by a gated fusion module. (e)-(f) We stabilize visibility learning by proposing additional occlusion-aware training objective and evaluation suite. (g) The predicted visibility could benefits various downstream face-centric applications.
  • Figure 2: Overview of the OccFace evaluator. We keep heatmap-based landmark localization and explicitly predict per-point visibility, enabling downstream modules to distinguish true occlusion from ambiguous predictions.
  • Figure 3: Our unified 100-point layout extends a standard 68-point schema COCO by adding (i) fine-grained eye anchors (incl. pupil/iris), (ii) inner-mouth structure, and (iii) explicit ear contours.
  • Figure 4: Samples from the Genie-Face dataset with annotation across diverse human-like domains. Landmarks are color-coded by visibility: yellow for visible and blue for invisible (occluded) points.
  • Figure 5: Qualitative comparison of landmark localization and visibility prediction on Genie-Face. Each triplet shows (left) ground truth, (middle) OccFace prediction, and (right) ORFormer prediction. Landmarks are color-coded by visibility: yellow for visible and blue for occluded. Since ORFormer does not natively predict visibility, we add a prediction head from its final feature space trained with our visibility objective. In addition to better landmark localization performance, OccFace produces more accurate visibility estimates, particularly for rotation-driven self-occlusion on far-side landmarks.
  • ...and 1 more figures