Table of Contents
Fetching ...

Orientation Matters: Making 3D Generative Models Orientation-Aligned

Yichong Lu, Yuzhuo Tian, Zijin Jiang, Yikun Zhao, Yuanbo Yang, Hao Ouyang, Haoji Hu, Huimin Yu, Yujun Shen, Yiyi Liao

TL;DR

This work tackles orientation misalignment in 3D generative models by introducing orientation-aligned 3D object generation and a large-scale dataset, Objaverse-OA, spanning 1,008 categories with 14,832 orientation-aligned models. By fine-tuning two representative backbones—Trellis-OA (3D-VAE) and Wonder3D-OA (multi-view diffusion)—on Objaverse-OA, the authors produce canonical, orientation-consistent 3D objects that generalize to unseen categories. They demonstrate two practical downstream applications: zero-shot model-free object orientation estimation and efficient arrow-based rotation manipulation in AR/3D software, outperforming post-hoc alignment baselines. The work provides a pathway for reliable cross-category 3D generation with consistent pose priors, benefiting downstream perception, AR, and robotics tasks.”

Abstract

Humans intuitively perceive object shape and orientation from a single image, guided by strong priors about canonical poses. However, existing 3D generative models often produce misaligned results due to inconsistent training data, limiting their usability in downstream tasks. To address this gap, we introduce the task of orientation-aligned 3D object generation: producing 3D objects from single images with consistent orientations across categories. To facilitate this, we construct Objaverse-OA, a dataset of 14,832 orientation-aligned 3D models spanning 1,008 categories. Leveraging Objaverse-OA, we fine-tune two representative 3D generative models based on multi-view diffusion and 3D variational autoencoder frameworks to produce aligned objects that generalize well to unseen objects across various categories. Experimental results demonstrate the superiority of our method over post-hoc alignment approaches. Furthermore, we showcase downstream applications enabled by our aligned object generation, including zero-shot object orientation estimation via analysis-by-synthesis and efficient arrow-based object rotation manipulation.

Orientation Matters: Making 3D Generative Models Orientation-Aligned

TL;DR

This work tackles orientation misalignment in 3D generative models by introducing orientation-aligned 3D object generation and a large-scale dataset, Objaverse-OA, spanning 1,008 categories with 14,832 orientation-aligned models. By fine-tuning two representative backbones—Trellis-OA (3D-VAE) and Wonder3D-OA (multi-view diffusion)—on Objaverse-OA, the authors produce canonical, orientation-consistent 3D objects that generalize to unseen categories. They demonstrate two practical downstream applications: zero-shot model-free object orientation estimation and efficient arrow-based rotation manipulation in AR/3D software, outperforming post-hoc alignment baselines. The work provides a pathway for reliable cross-category 3D generation with consistent pose priors, benefiting downstream perception, AR, and robotics tasks.”

Abstract

Humans intuitively perceive object shape and orientation from a single image, guided by strong priors about canonical poses. However, existing 3D generative models often produce misaligned results due to inconsistent training data, limiting their usability in downstream tasks. To address this gap, we introduce the task of orientation-aligned 3D object generation: producing 3D objects from single images with consistent orientations across categories. To facilitate this, we construct Objaverse-OA, a dataset of 14,832 orientation-aligned 3D models spanning 1,008 categories. Leveraging Objaverse-OA, we fine-tune two representative 3D generative models based on multi-view diffusion and 3D variational autoencoder frameworks to produce aligned objects that generalize well to unseen objects across various categories. Experimental results demonstrate the superiority of our method over post-hoc alignment approaches. Furthermore, we showcase downstream applications enabled by our aligned object generation, including zero-shot object orientation estimation via analysis-by-synthesis and efficient arrow-based object rotation manipulation.

Paper Structure

This paper contains 27 sections, 25 figures, 9 tables.

Figures (25)

  • Figure 1: Objaverse-OA for Orientation-Aligned Generation. We construct a new dataset named Objaverse-OA, which contains orientation-aligned 3D models across 1008 categories (top). Using Objaverse-OA, we make existing 3D generative models orientation-aligned, which can further be used for zero-shot model-free orientation estimation (bottom left) and efficient arrow-based 3D object rotation manipulation (bottom right).
  • Figure 2: VLM's Performance in Orientation Estimation. We utilize our manually curated dataset as ground truth (GT) and show the error rate of VLM's estimation across different categories. We observe that (1) the VLM demonstrates particular difficulty in recognizing front-facing orientations for stick-like objects, and (2) a significant portion of recognition errors occur when processing objects with inherently unclear or ambiguous frontal views. These challenges highlight the necessity of our manual curation.
  • Figure 3: Trellis-OA and Wonder3D-OA.We fine-tune two representative methods: Trellis Xiang2024Structured3L, based on a 3D-VAE backbone (top), and Wonder3D Long2023Wonder3DSI, based on a multi-view diffusion backbone (bottom). For the 3D-VAE, we find that fine-tuning only the sparse structure generator is sufficient to produce orientation-aligned objects. For the multi-view diffusion model, we adopt LoRA as a lightweight domain adapter to enable the generation of orientation-aligned target images.
  • Figure 4: Zero-Shot Orientation Estimation. Our orientation-aligned 3D object acts as a template for pose estimation by rendering it from multiple views, refining each, and selecting the best-matching viewpoint. Note that we do not perform training for this downstream task, where the pose refinement module is directly from FoundationPose Wen2023FoundationPoseU6, and the pose selection module directly utilizes the pre-trained DINO feature extractor Oquab2023DINOv2LR.
  • Figure 5: Qualitative Results on multi-view diffusion backbone, Wonder3D. For each input image, we show two views with the same index from the multi-view predictions.
  • ...and 20 more figures