Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

Shivam Duggal; Yushi Hu; Oscar Michel; Aniruddha Kembhavi; William T. Freeman; Noah A. Smith; Ranjay Krishna; Antonio Torralba; Ali Farhadi; Wei-Chiu Ma

Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

Shivam Duggal, Yushi Hu, Oscar Michel, Aniruddha Kembhavi, William T. Freeman, Noah A. Smith, Ranjay Krishna, Antonio Torralba, Ali Farhadi, Wei-Chiu Ma

TL;DR

Eval3D proposes an interpretable, fine grained evaluation framework for text and image driven 3D generation by leveraging a diverse set of foundation models as probes. It defines five complementary metrics—Geometric, Semantic, Structural, Text-3D Alignment, and Aesthetic—plus 3D artifact localization, enabling pixel level and 3D space localization of inconsistencies. The framework is validated on a curated Eval3D Benchmark with dense human annotations and shows stronger alignment with human judgments than prior open or closed source metrics. Results reveal that many top performing 3D generators still suffer from geometric or semantic inconsistencies, and image guidance can improve semantic and structural coherence. Eval3D is open-source and modular, promoting reliable evaluation and potential feedback-driven improvements in 3D generation systems.

Abstract

Despite the unprecedented progress in the field of 3D generation, current systems still often fail to produce high-quality 3D assets that are visually appealing and geometrically and semantically consistent across multiple viewpoints. To effectively assess the quality of the generated 3D data, there is a need for a reliable 3D evaluation tool. Unfortunately, existing 3D evaluation metrics often overlook the geometric quality of generated assets or merely rely on black-box multimodal large language models for coarse assessment. In this paper, we introduce Eval3D, a fine-grained, interpretable evaluation tool that can faithfully evaluate the quality of generated 3D assets based on various distinct yet complementary criteria. Our key observation is that many desired properties of 3D generation, such as semantic and geometric consistency, can be effectively captured by measuring the consistency among various foundation models and tools. We thus leverage a diverse set of models and tools as probes to evaluate the inconsistency of generated 3D assets across different aspects. Compared to prior work, Eval3D provides pixel-wise measurement, enables accurate 3D spatial feedback, and aligns more closely with human judgments. We comprehensively evaluate existing 3D generation models using Eval3D and highlight the limitations and challenges of current models.

Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

TL;DR

Abstract

Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (21)