Table of Contents
Fetching ...

CanoVerse: 3D Object Scalable Canonicalization and Dataset for Generation and Pose

Li Jin, Yuchen Yang, Weikai Chen, Yujie Wang, Dehao Hao, Tanghui Jia, Yingda Yin, Zeyu Hu, Runze Zhang, Keyang Luo, Li Yuan, Long Quan, Xin Wang, Xueying Qin

TL;DR

Canoverse improves 3D generation stability, enables precise cross-modal 3D shape retrieval, and unlocks zero-shot point-cloud orientation estimation even for out-of-distribution data as well as transforming canonicalization from manual curation into a high-throughput data generation pipeline.

Abstract

3D learning systems implicitly assume that objects occupy a coherent reference frame. Nonetheless, in practice, every asset arrives with an arbitrary global rotation, and models are left to resolve directional ambiguity on their own. This persistent misalignment suppresses pose-consistent generation, and blocks the emergence of stable directional semantics. To address this issue, we construct \methodName{}, a massive canonical 3D dataset of 320K objects over 1,156 categories -- an order-of-magnitude increase over prior work. At this scale, directional semantics become statistically learnable: Canoverse improves 3D generation stability, enables precise cross-modal 3D shape retrieval, and unlocks zero-shot point-cloud orientation estimation even for out-of-distribution data. This is achieved by a new canonicalization framework that reduces alignment from minutes to seconds per object via compact hypothesis generation and lightweight human discrimination, transforming canonicalization from manual curation into a high-throughput data generation pipeline. The Canoverse dataset will be publicly released upon acceptance. Project page: https://github.com/123321456-gif/Canoverse

CanoVerse: 3D Object Scalable Canonicalization and Dataset for Generation and Pose

TL;DR

Canoverse improves 3D generation stability, enables precise cross-modal 3D shape retrieval, and unlocks zero-shot point-cloud orientation estimation even for out-of-distribution data as well as transforming canonicalization from manual curation into a high-throughput data generation pipeline.

Abstract

3D learning systems implicitly assume that objects occupy a coherent reference frame. Nonetheless, in practice, every asset arrives with an arbitrary global rotation, and models are left to resolve directional ambiguity on their own. This persistent misalignment suppresses pose-consistent generation, and blocks the emergence of stable directional semantics. To address this issue, we construct \methodName{}, a massive canonical 3D dataset of 320K objects over 1,156 categories -- an order-of-magnitude increase over prior work. At this scale, directional semantics become statistically learnable: Canoverse improves 3D generation stability, enables precise cross-modal 3D shape retrieval, and unlocks zero-shot point-cloud orientation estimation even for out-of-distribution data. This is achieved by a new canonicalization framework that reduces alignment from minutes to seconds per object via compact hypothesis generation and lightweight human discrimination, transforming canonicalization from manual curation into a high-throughput data generation pipeline. The Canoverse dataset will be publicly released upon acceptance. Project page: https://github.com/123321456-gif/Canoverse
Paper Structure (20 sections, 4 equations, 9 figures, 6 tables)

This paper contains 20 sections, 4 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: We present CanoVerse, a large-scale canonical dataset standardizing object orientation, size, and position with intra- and inter-category alignment. It contains 1,156 categories and 320k objects, making it the largest canonical dataset to date, an order of magnitude larger than existing ones.
  • Figure 2: Scale and category coverage of different canonical datasets. Our dataset boasts the largest number of categories and objects among existing canonical datasets—far surpassing each individual one and even the total of all aforementioned datasets.
  • Figure 3: 3D canonicalization pipeline. Left: prior manual alignment ($\sim$101.4 s/object) yields small-scale datasets. Right: our candidate generation ($\sim$3 s) plus one-click selection ($\sim$2.7 s) gives $\sim$37.5$\times$ faster annotation and enables a 320K canonical dataset.
  • Figure 4: For symmetric categories, we annotate the symmetry axis and angle (left). Due to diverse object shapes, unified annotation is impractical. We thus design separate standards based on object characteristics and generate candidate poses in the vertical (middle) and horizontal (right) directions.
  • Figure 5: Annotation statistics. 750K annotated; 51$\%$ discarded for quality, 6$\%$ for pose errors, 43$\%$ retained (middle). Main discard types including thin-shell meshes, meaningless structures (left). Selection distribution over the 5 candidates (right).
  • ...and 4 more figures