Table of Contents
Fetching ...

SRAM: Shape-Realism Alignment Metric for No Reference 3D Shape Evaluation

Sheng Liu, Tianyu Luan, Phani Nuney, Xuelu Feng, Junsong Yuan

TL;DR

SRAM introduces a no-reference metric for 3D shape realism by leveraging a 3D-aware language-model bridge to map mesh information to perceptual realism without ground-truth references. It encodes shapes with Point-BERT, uses a realism-focused LLM pipeline, and is trained on the RealismGrading dataset of human-annotated realism scores for real-world distortions. The approach achieves strong correlation with human judgments and outperforms a PointNet baseline, with ablations validating finetuning and prompt strategies. RealismGrading and SRAM together provide a practical tool for evaluating realism in no-reference 3D shape scenarios across reconstruction and generation tasks.

Abstract

3D generation and reconstruction techniques have been widely used in computer games, film, and other content creation areas. As the application grows, there is a growing demand for 3D shapes that look truly realistic. Traditional evaluation methods rely on a ground truth to measure mesh fidelity. However, in many practical cases, a shape's realism does not depend on having a ground truth reference. In this work, we propose a Shape-Realism Alignment Metric that leverages a large language model (LLM) as a bridge between mesh shape information and realism evaluation. To achieve this, we adopt a mesh encoding approach that converts 3D shapes into the language token space. A dedicated realism decoder is designed to align the language model's output with human perception of realism. Additionally, we introduce a new dataset, RealismGrading, which provides human-annotated realism scores without the need for ground truth shapes. Our dataset includes shapes generated by 16 different algorithms on over a dozen objects, making it more representative of practical 3D shape distributions. We validate our metric's performance and generalizability through k-fold cross-validation across different objects. Experimental results show that our metric correlates well with human perceptions and outperforms existing methods, and has good generalizability.

SRAM: Shape-Realism Alignment Metric for No Reference 3D Shape Evaluation

TL;DR

SRAM introduces a no-reference metric for 3D shape realism by leveraging a 3D-aware language-model bridge to map mesh information to perceptual realism without ground-truth references. It encodes shapes with Point-BERT, uses a realism-focused LLM pipeline, and is trained on the RealismGrading dataset of human-annotated realism scores for real-world distortions. The approach achieves strong correlation with human judgments and outperforms a PointNet baseline, with ablations validating finetuning and prompt strategies. RealismGrading and SRAM together provide a practical tool for evaluating realism in no-reference 3D shape scenarios across reconstruction and generation tasks.

Abstract

3D generation and reconstruction techniques have been widely used in computer games, film, and other content creation areas. As the application grows, there is a growing demand for 3D shapes that look truly realistic. Traditional evaluation methods rely on a ground truth to measure mesh fidelity. However, in many practical cases, a shape's realism does not depend on having a ground truth reference. In this work, we propose a Shape-Realism Alignment Metric that leverages a large language model (LLM) as a bridge between mesh shape information and realism evaluation. To achieve this, we adopt a mesh encoding approach that converts 3D shapes into the language token space. A dedicated realism decoder is designed to align the language model's output with human perception of realism. Additionally, we introduce a new dataset, RealismGrading, which provides human-annotated realism scores without the need for ground truth shapes. Our dataset includes shapes generated by 16 different algorithms on over a dozen objects, making it more representative of practical 3D shape distributions. We validate our metric's performance and generalizability through k-fold cross-validation across different objects. Experimental results show that our metric correlates well with human perceptions and outperforms existing methods, and has good generalizability.

Paper Structure

This paper contains 15 sections, 7 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Full reference 3D shape evaluation vs. no reference 3D shape evaluation. Left: Traditional metrics require ground truth references to evaluate the fidelity of 3D shapes. Right: Our metric can evaluate 3D shape realism without a reference. In this paper, we refer "no reference fidelity" as "realism" since fidelity is typically based on comparisons while realism is not.
  • Figure 2: The pipeline of our Shape-Realism Alignment Metric (SRAM). Our metric can take a mesh shape as input and measure its realism without a ground truth mesh shape reference. It uses a language model as a bridge to achieve alignment from 3D shape to realism score. The language model bridge has 3 inputs: text tokens from the system prompt, 3D shape tokens from the 3D shape encoder, and another part of text tokens from the realism prompt. In the output part of our model, we design a token-based realism decoder to align language tokens with realism scores.
  • Figure 3: We show example meshes along with their human-annotated realism score from our RealismGrading dataset. Methods used to produce these meshes are shown as well, e.g., "One2345pp". We observe that as the realism of a mesh increases, its annotated realism score also goes up.
  • Figure 4: The point-cloud-based ad-hoc baseline design. This baseline employs PointNet qi2017pointnet to extract shape features, which are then fed into the same realism decoder to produce a realism evaluation score.
  • Figure 5: We present the realism scores from our metric alongside human-annotated realism scores for various meshes. The results show that our metric assigns high realism scores to realistic meshes, while severely distorted meshes receive low scores. Our metric correlates well with human annotations, which reflects how human annotators perceive mesh realism.