Table of Contents
Fetching ...

ProCrop: Learning Aesthetic Image Cropping from Professional Compositions

Ke Zhang, Tianyu Ding, Jiachen Jiang, Tianyi Chen, Ilya Zharkov, Vishal M. Patel, Luming Liang

TL;DR

ProCrop tackles aesthetic image cropping by leveraging professional compositions through retrieval augmentation and a large-scale, weakly-annotated CAD dataset generated via outpainting. It retrieves compositionally similar professional images, fuses their features with the input query, and uses a transformer decoder to produce multiple high-quality crop proposals with explicit aesthetic scores. The composition-aware CAD dataset, built with ControlNet, GPT-4 dual-space prompts, and SAM masks, enables robust training in both supervised and weakly-supervised settings, achieving state-of-the-art performance and strong transferability. The authors provide code and data to support further research in image aesthetics and composition analysis.

Abstract

Image cropping is crucial for enhancing the visual appeal and narrative impact of photographs, yet existing rule-based and data-driven approaches often lack diversity or require annotated training data. We introduce ProCrop, a retrieval-based method that leverages professional photography to guide cropping decisions. By fusing features from professional photographs with those of the query image, ProCrop learns from professional compositions, significantly boosting performance. Additionally, we present a large-scale dataset of 242K weakly-annotated images, generated by out-painting professional images and iteratively refining diverse crop proposals. This composition-aware dataset generation offers diverse high-quality crop proposals guided by aesthetic principles and becomes the largest publicly available dataset for image cropping. Extensive experiments show that ProCrop significantly outperforms existing methods in both supervised and weakly-supervised settings. Notably, when trained on the new dataset, our ProCrop surpasses previous weakly-supervised methods and even matches fully supervised approaches. Both the code and dataset will be made publicly available to advance research in image aesthetics and composition analysis.

ProCrop: Learning Aesthetic Image Cropping from Professional Compositions

TL;DR

ProCrop tackles aesthetic image cropping by leveraging professional compositions through retrieval augmentation and a large-scale, weakly-annotated CAD dataset generated via outpainting. It retrieves compositionally similar professional images, fuses their features with the input query, and uses a transformer decoder to produce multiple high-quality crop proposals with explicit aesthetic scores. The composition-aware CAD dataset, built with ControlNet, GPT-4 dual-space prompts, and SAM masks, enables robust training in both supervised and weakly-supervised settings, achieving state-of-the-art performance and strong transferability. The authors provide code and data to support further research in image aesthetics and composition analysis.

Abstract

Image cropping is crucial for enhancing the visual appeal and narrative impact of photographs, yet existing rule-based and data-driven approaches often lack diversity or require annotated training data. We introduce ProCrop, a retrieval-based method that leverages professional photography to guide cropping decisions. By fusing features from professional photographs with those of the query image, ProCrop learns from professional compositions, significantly boosting performance. Additionally, we present a large-scale dataset of 242K weakly-annotated images, generated by out-painting professional images and iteratively refining diverse crop proposals. This composition-aware dataset generation offers diverse high-quality crop proposals guided by aesthetic principles and becomes the largest publicly available dataset for image cropping. Extensive experiments show that ProCrop significantly outperforms existing methods in both supervised and weakly-supervised settings. Notably, when trained on the new dataset, our ProCrop surpasses previous weakly-supervised methods and even matches fully supervised approaches. Both the code and dataset will be made publicly available to advance research in image aesthetics and composition analysis.

Paper Structure

This paper contains 32 sections, 5 equations, 15 figures, 11 tables.

Figures (15)

  • Figure 1: Overview of ProCrop's retrieval-based aesthetic cropping approach. (a) Construction of a professional image database and retrieval of images with similar compositional layouts. (b) Demonstration of ProCrop's cropping process, where a compositionally similar reference image guides the generation of aesthetically pleasing crop results.
  • Figure 2: The pipeline of ProCrop. Given an input image, ProCrop retrieves compositionally similar professional images and generates a textual description, which guide the model to produce aesthetically enhanced crops along with corresponding aesthetic scores.
  • Figure 3: Composition-aware dataset generation. Professional images undergo three stages to create diverse image-crop pairs.
  • Figure 4: Outpainting results with three variations: (1) BLIP-based composition understanding, (2) GPT-4 with solely within-image compositional descriptions, and (3) GPT-4 with the proposed dual-space composition understanding. The results show that our dual-space approach, through GPT-4, yields significantly more coherent and visually realistic outpainting outcomes.
  • Figure 5: Examples of outpainting results and crop proposals. Multiple crop proposals serve as high-quality pseudo-labels generated through the model-in-the-loop process.
  • ...and 10 more figures