Table of Contents
Fetching ...

Aesthetic Camera Viewpoint Suggestion with 3D Aesthetic Field

Sheyang Tang, Armin Shafiee Sarvestani, Jialu Xu, Xiaoyu Xu, Zhou Wang

TL;DR

This work introduces the notion of 3D aesthetic field that enables geometry-grounded aesthetic reasoning in 3D with sparse captures, allowing efficient viewpoint suggestions in contrast to costly RL searches and proposes a two-stage search pipeline that combines coarse viewpoint sampling with gradient-based refinement.

Abstract

The aesthetic quality of a scene depends strongly on camera viewpoint. Existing approaches for aesthetic viewpoint suggestion are either single-view adjustments, predicting limited camera adjustments from a single image without understanding scene geometry, or 3D exploration approaches, which rely on dense captures or prebuilt 3D environments coupled with costly reinforcement learning (RL) searches. In this work, we introduce the notion of 3D aesthetic field that enables geometry-grounded aesthetic reasoning in 3D with sparse captures, allowing efficient viewpoint suggestions in contrast to costly RL searches. We opt to learn this 3D aesthetic field using a feedforward 3D Gaussian Splatting network that distills high-level aesthetic knowledge from a pretrained 2D aesthetic model into 3D space, enabling aesthetic prediction for novel viewpoints from only sparse input views. Building on this field, we propose a two-stage search pipeline that combines coarse viewpoint sampling with gradient-based refinement, efficiently identifying aesthetically appealing viewpoints without dense captures or RL exploration. Extensive experiments show that our method consistently suggests viewpoints with superior framing and composition compared to existing approaches, establishing a new direction toward 3D-aware aesthetic modeling.

Aesthetic Camera Viewpoint Suggestion with 3D Aesthetic Field

TL;DR

This work introduces the notion of 3D aesthetic field that enables geometry-grounded aesthetic reasoning in 3D with sparse captures, allowing efficient viewpoint suggestions in contrast to costly RL searches and proposes a two-stage search pipeline that combines coarse viewpoint sampling with gradient-based refinement.

Abstract

The aesthetic quality of a scene depends strongly on camera viewpoint. Existing approaches for aesthetic viewpoint suggestion are either single-view adjustments, predicting limited camera adjustments from a single image without understanding scene geometry, or 3D exploration approaches, which rely on dense captures or prebuilt 3D environments coupled with costly reinforcement learning (RL) searches. In this work, we introduce the notion of 3D aesthetic field that enables geometry-grounded aesthetic reasoning in 3D with sparse captures, allowing efficient viewpoint suggestions in contrast to costly RL searches. We opt to learn this 3D aesthetic field using a feedforward 3D Gaussian Splatting network that distills high-level aesthetic knowledge from a pretrained 2D aesthetic model into 3D space, enabling aesthetic prediction for novel viewpoints from only sparse input views. Building on this field, we propose a two-stage search pipeline that combines coarse viewpoint sampling with gradient-based refinement, efficiently identifying aesthetically appealing viewpoints without dense captures or RL exploration. Extensive experiments show that our method consistently suggests viewpoints with superior framing and composition compared to existing approaches, establishing a new direction toward 3D-aware aesthetic modeling.
Paper Structure (20 sections, 2 equations, 10 figures, 6 tables)

This paper contains 20 sections, 2 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Given sparse scene captures (left), our method learns a 3D aesthetic field that encodes spatially varying aesthetic cues. This field enables geometry-grounded aesthetic reasoning in 3D, allowing efficient discovery of appealing camera viewpoints (right).
  • Figure 2: We distill aesthetic features into a feedforward Gaussian Splatting network (top). At inference, to search for aesthetic viewpoints, we adopt a two-stage pipeline: coarse sampling to find good candidates (bottom left) and local refinement by gradient ascent (bottom right).
  • Figure 3: (a) Aesthetic score predictions over consecutive frames. Our method produces scores closer to the ground truth while being smoother and more consistent across nearby views. The dashed line and box mark the regions visualized in (b) and (c), respectively. (b) Given the same aesthetic model and viewpoint, rendering artifacts in the predicted view (bottom) bias the RGB-scoring approach toward lower scores. (c) Ground truth scores fluctuate noticeably across nearly identical nearby views.
  • Figure 4: Aesthetic viewpoint suggestion examples with 4 input views in RE10k (top 2 rows) and DL3DV (bottom 3 rows). Red boxes in the last row show that single-view methods fail to remove distracting objects due to limited adjustment range. Zoom in for more details.
  • Figure 5: Visualization of sampled viewpoints colored by aesthetic score, with representative renderings shown alongside.
  • ...and 5 more figures