MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences

Weitao Wang; Haoran Xu; Yuxiao Yang; Zhifang Liu; Jun Meng; Haoqian Wang

MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences

Weitao Wang, Haoran Xu, Yuxiao Yang, Zhifang Liu, Jun Meng, Haoqian Wang

TL;DR

This work tackles the misalignment between automatic metrics and human preferences in image-to-3D evaluation by constructing a standardized prompt-and-annotation pipeline and introducing MVReward, a BLIP-based multi-view encoder reward model trained on 16k expert pairwise comparisons. It also proposes MVP, a plug-and-play tuning strategy that uses MVReward to align multi-view diffusion models with human preferences, improving geometry and texture quality across methods. Empirical results show MVReward outperforms traditional metrics in predicting human judgments, and MVP consistently enhances baseline multi-view diffusion models like Wonder3D and Era3D. The framework enables fair, transparent evaluation and more aligned generation in image-driven 3D synthesis, with potential for broader adoption in 3D content creation pipelines.

Abstract

Recent years have witnessed remarkable progress in 3D content generation. However, corresponding evaluation methods struggle to keep pace. Automatic approaches have proven challenging to align with human preferences, and the mixed comparison of text- and image-driven methods often leads to unfair evaluations. In this paper, we present a comprehensive framework to better align and evaluate multi-view diffusion models with human preferences. To begin with, we first collect and filter a standardized image prompt set from DALL$\cdot$E and Objaverse, which we then use to generate multi-view assets with several multi-view diffusion models. Through a systematic ranking pipeline on these assets, we obtain a human annotation dataset with 16k expert pairwise comparisons and train a reward model, coined MVReward, to effectively encode human preferences. With MVReward, image-driven 3D methods can be evaluated against each other in a more fair and transparent manner. Building on this, we further propose Multi-View Preference Learning (MVP), a plug-and-play multi-view diffusion tuning strategy. Extensive experiments demonstrate that MVReward can serve as a reliable metric and MVP consistently enhances the alignment of multi-view diffusion models with human preferences.

MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences

TL;DR

Abstract

MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)