Table of Contents
Fetching ...

Pseudo-Labeling by Multi-Policy Viewfinder Network for Image Cropping

Zhiyu Pan, Kewei Wang, Yizheng Wu, Liwen Xiao, Jiahao Cui, Zhicheng Wang, Zhiguo Cao

TL;DR

This paper tackles the limited availability of labeled reframing boxes for automatic image cropping by adopting omni-supervised learning that leverages unlabeled images through pseudo-labeling. It introduces MPV-Net, a multi-policy viewfinder network, to rectify teacher-generated pseudo labels via diverse rectifying policies, with a policy selecting mechanism based on stability under box jittering. Model updates follow a mean-teacher style EMA and a combined loss $\ell = \ell_s^c + \ell_s^f + \lambda(\ell_u^c + \ell_u^f)$; the best policy’s rectified pseudo label $\hat{y}^i_u$ is chosen by minimizing the policy variance. Empirical results on FCDB and FLMS show state-of-the-art performance among regression-based methods and clear gains over standard pseudo-labeling baselines, validating the usefulness of omni-supervised learning for aesthetic cropping.

Abstract

Automatic image cropping models predict reframing boxes to enhance image aesthetics. Yet, the scarcity of labeled data hinders the progress of this task. To overcome this limitation, we explore the possibility of utilizing both labeled and unlabeled data together to expand the scale of training data for image cropping models. This idea can be implemented in a pseudo-labeling way: producing pseudo labels for unlabeled data by a teacher model and training a student model with these pseudo labels. However, the student may learn from teacher's mistakes. To address this issue, we propose the multi-policy viewfinder network (MPV-Net) that offers diverse refining policies to rectify the mistakes in original pseudo labels from the teacher. The most reliable policy is selected to generate trusted pseudo labels. The reliability of policies is evaluated via the robustness against box jittering. The efficacy of our method can be evaluated by the improvement compared to the supervised baseline which only uses labeled data. Notably, our MPV-Net outperforms off-the-shelf pseudo-labeling methods, yielding the most substantial improvement over the supervised baseline. Furthermore, our approach achieves state-of-the-art results on both the FCDB and FLMS datasets, signifying the superiority of our approach.

Pseudo-Labeling by Multi-Policy Viewfinder Network for Image Cropping

TL;DR

This paper tackles the limited availability of labeled reframing boxes for automatic image cropping by adopting omni-supervised learning that leverages unlabeled images through pseudo-labeling. It introduces MPV-Net, a multi-policy viewfinder network, to rectify teacher-generated pseudo labels via diverse rectifying policies, with a policy selecting mechanism based on stability under box jittering. Model updates follow a mean-teacher style EMA and a combined loss ; the best policy’s rectified pseudo label is chosen by minimizing the policy variance. Empirical results on FCDB and FLMS show state-of-the-art performance among regression-based methods and clear gains over standard pseudo-labeling baselines, validating the usefulness of omni-supervised learning for aesthetic cropping.

Abstract

Automatic image cropping models predict reframing boxes to enhance image aesthetics. Yet, the scarcity of labeled data hinders the progress of this task. To overcome this limitation, we explore the possibility of utilizing both labeled and unlabeled data together to expand the scale of training data for image cropping models. This idea can be implemented in a pseudo-labeling way: producing pseudo labels for unlabeled data by a teacher model and training a student model with these pseudo labels. However, the student may learn from teacher's mistakes. To address this issue, we propose the multi-policy viewfinder network (MPV-Net) that offers diverse refining policies to rectify the mistakes in original pseudo labels from the teacher. The most reliable policy is selected to generate trusted pseudo labels. The reliability of policies is evaluated via the robustness against box jittering. The efficacy of our method can be evaluated by the improvement compared to the supervised baseline which only uses labeled data. Notably, our MPV-Net outperforms off-the-shelf pseudo-labeling methods, yielding the most substantial improvement over the supervised baseline. Furthermore, our approach achieves state-of-the-art results on both the FCDB and FLMS datasets, signifying the superiority of our approach.
Paper Structure (13 sections, 8 equations, 8 figures, 5 tables)

This paper contains 13 sections, 8 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: The role that MPV-Net plays in pseudo-labeling for image cropping. (a) With vanilla pseudo-labeling, the student may learn from mistakes of the teacher, which is called the problem of confirmation bias tarvainen2017mean. The confirmation bias may cause a deterioration of pseudo labels during the iteration. (b) We propose the multi-policy viewfinder network (MPV-Net) to rectify the mistakes of teacher, which makes the pseudo label trusted.
  • Figure 2: The technical pipeline of the proposed pseudo-labeling framework for image cropping. In this work, the teacher branch includes the reframing box regression model, a.k.a. the composer, and the multi-policy viewfinder network (MPV-Net). Each head of the MPV-Net can provide with a rectifying policy for the original pseudo labels predicted by the teacher composer. The proposed policies are selected according to their stability. The most stable policy is used to generate the trusted pseudo labels. The teacher models are updated by their student models respectively via exponential moving average (EMA) tarvainen2017mean. The student models are optimized by both the supervised and unsupervised losses. "sg" in the figure means stopping gradient.
  • Figure 3: The architecture of the MPV-Network. The green box is the original pseudo label. The multiple heads provide different rectifying policies for the original pseudo labels. The heads are initialized and trained independently.
  • Figure 4: The policy selecting mechanism. The original pseudo labels from the teacher composer are jittered $m$ times. The jittered boxes and the original box are fed into the teacher MPV-Net. Each policy of the MPV-Net generates a set of rectified boxes. Then we calculate the variance xu2021end of the rectified boxes from each policy to measure its stability. The policies are ranked based on their variance. We select the policy with the lowest variance to rectify the original pseudo label as the trusted one. The solid boxes are the rectified results of the dashed boxes in the same color.
  • Figure 5: Qualitative comparison with SOTA image cropping methods. Results of CACNet tagged with $\dag$ denotes that the extra annotations from composition classification dataset lee2018photographic are abandoned.
  • ...and 3 more figures