Table of Contents
Fetching ...

Lightweight Facial Attractiveness Prediction Using Dual Label Distribution

Shu Liu, Enquan Huang, Ziyu Zhou, Yan Xu, Xiaoyan Kui, Tao Lei, Hongying Meng

TL;DR

This work tackles facial attractiveness prediction under practical constraints by introducing a lightweight, end-to-end model that leverages a dual-label distribution within a label-distribution learning framework. Attractiveness is encoded as a distribution via a Laplace-based construction from ground-truth scores and dispersion, while a separate rating distribution captures human ratings; these, together with a score regression loss, are optimized on a MobileNetV2 backbone. The approach achieves state-of-the-art or competitive accuracy on SCUT-FBP5500 and strong efficiency, with only about 2.28 million parameters and 0.31G MAdds, enabling deployment in resource-limited settings. Visualizations reveal interpretable attention patterns on facial regions and highlight potential improvements for dynamic or multi-view data in future work.

Abstract

Facial attractiveness prediction (FAP) aims to assess facial attractiveness automatically based on human aesthetic perception. Previous methods using deep convolutional neural networks have improved the performance, but their large-scale models have led to a deficiency in flexibility. In addition, most methods fail to take full advantage of the dataset. In this paper, we present a novel end-to-end FAP approach that integrates dual label distribution and lightweight design. The manual ratings, attractiveness score, and standard deviation are aggregated explicitly to construct a dual-label distribution to make the best use of the dataset, including the attractiveness distribution and the rating distribution. Such distributions, as well as the attractiveness score, are optimized under a joint learning framework based on the label distribution learning (LDL) paradigm. The data processing is simplified to a minimum for a lightweight design, and MobileNetV2 is selected as our backbone. Extensive experiments are conducted on two benchmark datasets, where our approach achieves promising results and succeeds in balancing performance and efficiency. Ablation studies demonstrate that our delicately designed learning modules are indispensable and correlated. Additionally, the visualization indicates that our approach can perceive facial attractiveness and capture attractive facial regions to facilitate semantic predictions. The code is available at https://github.com/enquan/2D_FAP.

Lightweight Facial Attractiveness Prediction Using Dual Label Distribution

TL;DR

This work tackles facial attractiveness prediction under practical constraints by introducing a lightweight, end-to-end model that leverages a dual-label distribution within a label-distribution learning framework. Attractiveness is encoded as a distribution via a Laplace-based construction from ground-truth scores and dispersion, while a separate rating distribution captures human ratings; these, together with a score regression loss, are optimized on a MobileNetV2 backbone. The approach achieves state-of-the-art or competitive accuracy on SCUT-FBP5500 and strong efficiency, with only about 2.28 million parameters and 0.31G MAdds, enabling deployment in resource-limited settings. Visualizations reveal interpretable attention patterns on facial regions and highlight potential improvements for dynamic or multi-view data in future work.

Abstract

Facial attractiveness prediction (FAP) aims to assess facial attractiveness automatically based on human aesthetic perception. Previous methods using deep convolutional neural networks have improved the performance, but their large-scale models have led to a deficiency in flexibility. In addition, most methods fail to take full advantage of the dataset. In this paper, we present a novel end-to-end FAP approach that integrates dual label distribution and lightweight design. The manual ratings, attractiveness score, and standard deviation are aggregated explicitly to construct a dual-label distribution to make the best use of the dataset, including the attractiveness distribution and the rating distribution. Such distributions, as well as the attractiveness score, are optimized under a joint learning framework based on the label distribution learning (LDL) paradigm. The data processing is simplified to a minimum for a lightweight design, and MobileNetV2 is selected as our backbone. Extensive experiments are conducted on two benchmark datasets, where our approach achieves promising results and succeeds in balancing performance and efficiency. Ablation studies demonstrate that our delicately designed learning modules are indispensable and correlated. Additionally, the visualization indicates that our approach can perceive facial attractiveness and capture attractive facial regions to facilitate semantic predictions. The code is available at https://github.com/enquan/2D_FAP.
Paper Structure (36 sections, 12 equations, 4 figures, 5 tables)

This paper contains 36 sections, 12 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overview of our framework. For a given facial image, its manual ratings, attractiveness score, and standard deviation are aggregated explicitly. With the human ratings $\{y_1,y_2,\dots,y_r\}$, its attractiveness distribution $\boldsymbol p$ is generated using the Laplace distribution, while its rating distribution $\boldsymbol r$ is derived directly. Then, the facial image and $\boldsymbol p$ are fed into MobileNetV2 to output the predicted attractiveness distribution $\hat{\boldsymbol{p}}$, which is subsequently utilized to compute the predicted attractiveness score $\hat{y}$ and obtain the predicted rating distribution $\hat{\boldsymbol r}$. Finally, $\hat{\boldsymbol{p}}$, $\hat{\boldsymbol r}$ and $\hat{y}$ are jointly optimized under the dual-label distribution and score regression learning modules.
  • Figure 2: The building block of MobileNetV2, which consists of linear bottlenecks and an inverted residual structure.
  • Figure 3: Comparison of the proposed $L_{score}$ and $L_1$, $L_2$ loss. When the absolute error between the ground-truth and predicted score reaches 1, $L_{score}$ is $e-2$ larger than $L_1$ or $L_2$ loss, thus refining the score prediction more vigorously.
  • Figure 4: The heatmap visualization, where warmer colors (e.g., red) indicate higher intensities and cooler colors (e.g., blue) indicate lower intensities. Each row corresponds to a distinct degree of attractiveness. The left three columns ((a)-(d)) and the rightmost column ((e)-(h)) are examples of good and poor predictions, respectively. The pairs below the images represent the $<$ground-truth score, predicted score$>$.