Table of Contents
Fetching ...

Skel3D: Skeleton Guided Novel View Synthesis

Aron Fóthi, Bence Fazekas, Natabara Máté Gyöngyössy, Kristian Fenech

TL;DR

This paper presents an approach for monocular open-set novel view synthesis (NVS) that leverages object skeletons to guide the underlying diffusion model, and outperforms existing state-of-the-art NVS techniques both quantitatively and qualitatively.

Abstract

In this paper, we present an approach for monocular open-set novel view synthesis (NVS) that leverages object skeletons to guide the underlying diffusion model. Building upon a baseline that utilizes a pre-trained 2D image generator, our method takes advantage of the Objaverse dataset, which includes animated objects with bone structures. By introducing a skeleton guide layer following the existing ray conditioning normalization (RCN) layer, our approach enhances pose accuracy and multi-view consistency. The skeleton guide layer provides detailed structural information for the generative model, improving the quality of synthesized views. Experimental results demonstrate that our skeleton-guided method significantly enhances consistency and accuracy across diverse object categories within the Objaverse dataset. Our method outperforms existing state-of-the-art NVS techniques both quantitatively and qualitatively, without relying on explicit 3D representations.

Skel3D: Skeleton Guided Novel View Synthesis

TL;DR

This paper presents an approach for monocular open-set novel view synthesis (NVS) that leverages object skeletons to guide the underlying diffusion model, and outperforms existing state-of-the-art NVS techniques both quantitatively and qualitatively.

Abstract

In this paper, we present an approach for monocular open-set novel view synthesis (NVS) that leverages object skeletons to guide the underlying diffusion model. Building upon a baseline that utilizes a pre-trained 2D image generator, our method takes advantage of the Objaverse dataset, which includes animated objects with bone structures. By introducing a skeleton guide layer following the existing ray conditioning normalization (RCN) layer, our approach enhances pose accuracy and multi-view consistency. The skeleton guide layer provides detailed structural information for the generative model, improving the quality of synthesized views. Experimental results demonstrate that our skeleton-guided method significantly enhances consistency and accuracy across diverse object categories within the Objaverse dataset. Our method outperforms existing state-of-the-art NVS techniques both quantitatively and qualitatively, without relying on explicit 3D representations.

Paper Structure

This paper contains 21 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Using the predicted skeleton of the object as guide for novel view synthesis.
  • Figure 2: Architecture of our skeleton-guided model for NVS. Given a single input image, we introduce a Skeleton Conditioning Normalization (red) that utilizes the skeleton image embedding, enhancing the model's capability to capture more precise views. For full details of the diffusion UNet Architecture see rombach2022high
  • Figure 3: The first column shows the source image for NVS, followed by the target view in the second column. The third column presents the skeleton guidance used in the process. The fourth column, highlighted with green values, demonstrates the superior performance of our model. The final column shows the Free3D results, with red values indicating areas where our model outperforms.
  • Figure 4: Average improvement in score depending on the quality of the skeleton. The x-axis represents the IoU of the bounding boxes of the object and the skeleton, which measures how well the skeleton fits the object. The y-axis shows the average improvement in metric scores, with errorbars given by the bootstrapped estimate of the standard error. Metrics where lower values are better (L1 Loss, LPIPS), were inverted by multiplying by $-1$, and PSNR was scaled by a factor of $0.01$ for ease of visualization.
  • Figure 5: When the guidance skeleton is insufficient, our model's performance drops compared to the baseline model. Best viewed online due to the small skeleton sizes compared to the object models.