Table of Contents
Fetching ...

SA-IQA: Redefining Image Quality Assessment for Spatial Aesthetics with Multi-Dimensional Rewards

Yuan Gao, Jin Song

TL;DR

<3-5 sentence high-level summary> SA-BENCH introduces the first large-scale interior spatial aesthetics benchmark with four dimensions: layout, harmony, lighting, and distortion, enabling precise, MOS-based evaluation of interior images. The authors then present SA-IQA, a multimodal, expert-guided IQA model built via supervised fine-tuning on SA-BENCH and fused across dimensions with a Bradley–Terry objective to yield a calibrated overall score. Extensive experiments show SA-IQA outperforms traditional, DL-based, and commercial MLLMs in PLCC/SRCC across all dimensions and demonstrate practical value when used as a reward in GRPO-based reinforcement learning and as a Best-of-N re-ranking signal. The work provides open-source data and tools to advance domain-specific IQA for interior design and AI-generated imagery.

Abstract

In recent years, Image Quality Assessment (IQA) for AI-generated images (AIGI) has advanced rapidly; however, existing methods primarily target portraits and artistic images, lacking a systematic evaluation of interior scenes. We introduce Spatial Aesthetics, a paradigm that assesses the aesthetic quality of interior images along four dimensions: layout, harmony, lighting, and distortion. We construct SA-BENCH, the first benchmark for spatial aesthetics, comprising 18,000 images and 50,000 precise annotations. Employing SA-BENCH, we systematically evaluate current IQA methodologies and develop SA-IQA, through MLLM fine-tuning and a multidimensional fusion approach, as a comprehensive reward framework for assessing spatial aesthetics. We apply SA-IQA to two downstream tasks: (1) serving as a reward signal integrated with GRPO reinforcement learning to optimize the AIGC generation pipeline, and (2) Best-of-N selection to filter high-quality images and improve generation quality. Experiments indicate that SA-IQA significantly outperforms existing methods on SA-BENCH, setting a new standard for spatial aesthetics evaluation. Code and dataset will be open-sourced to advance research and applications in this domain.

SA-IQA: Redefining Image Quality Assessment for Spatial Aesthetics with Multi-Dimensional Rewards

TL;DR

<3-5 sentence high-level summary> SA-BENCH introduces the first large-scale interior spatial aesthetics benchmark with four dimensions: layout, harmony, lighting, and distortion, enabling precise, MOS-based evaluation of interior images. The authors then present SA-IQA, a multimodal, expert-guided IQA model built via supervised fine-tuning on SA-BENCH and fused across dimensions with a Bradley–Terry objective to yield a calibrated overall score. Extensive experiments show SA-IQA outperforms traditional, DL-based, and commercial MLLMs in PLCC/SRCC across all dimensions and demonstrate practical value when used as a reward in GRPO-based reinforcement learning and as a Best-of-N re-ranking signal. The work provides open-source data and tools to advance domain-specific IQA for interior design and AI-generated imagery.

Abstract

In recent years, Image Quality Assessment (IQA) for AI-generated images (AIGI) has advanced rapidly; however, existing methods primarily target portraits and artistic images, lacking a systematic evaluation of interior scenes. We introduce Spatial Aesthetics, a paradigm that assesses the aesthetic quality of interior images along four dimensions: layout, harmony, lighting, and distortion. We construct SA-BENCH, the first benchmark for spatial aesthetics, comprising 18,000 images and 50,000 precise annotations. Employing SA-BENCH, we systematically evaluate current IQA methodologies and develop SA-IQA, through MLLM fine-tuning and a multidimensional fusion approach, as a comprehensive reward framework for assessing spatial aesthetics. We apply SA-IQA to two downstream tasks: (1) serving as a reward signal integrated with GRPO reinforcement learning to optimize the AIGC generation pipeline, and (2) Best-of-N selection to filter high-quality images and improve generation quality. Experiments indicate that SA-IQA significantly outperforms existing methods on SA-BENCH, setting a new standard for spatial aesthetics evaluation. Code and dataset will be open-sourced to advance research and applications in this domain.

Paper Structure

This paper contains 46 sections, 8 equations, 13 figures, 8 tables.

Figures (13)

  • Figure 1: Visualizing the SA-BENCH dataset. From top to bottom, each row corresponds to the results of different models in the dataset: the first row uses SD1.5-Inpaint sd, the second row is SDXL-BrushNet brushnet, the third row is FLUX-Inpaint labs2025flux, and the fourth row is FLUX EasyControl zhang2025easycontrol.
  • Figure 2: Overview of the SA-IQA Framework. The left panel depicts our three-stage workflow: establishing Spatial Aesthetics dimensions and constructing SA-BENCH, developing the SA-IQA reward model via MLLM fine-tuning, and deploying it for GRPO-based prompt optimization and Best-of-N selection. The right panel details the SA-IQA model's principle, showing how it processes an image and a dimension-conditioned query to predict multi-dimensional MOS scores, which are then calibrated and fused into a single spatial aesthetic score.
  • Figure 3: Sample annotation examples from the SA-Bench. For each quality dimension (Layout, Harmony, Lighting, and Distortion), five representative examples are presented, spanning quality levels from bad (1) to excellent (5). These examples serve as crucial visual guidelines for human annotators, ensuring consistent and high-quality scoring throughout our benchmark.
  • Figure 4: MOS Distribution on SA-Bench. The plot shows the probability distribution of Mean Opinion Scores (MOS) for each of the four dimensions (Layout, Harmony, Lighting, and Distortion), illustrating the range and concentration of scores from 1 (bad) to 5 (excellent).
  • Figure 5: Qualitative Visualization of RL Improvement. This figure presents generated background examples from the intermediate training results of our RL process
  • ...and 8 more figures