From One to More: Contextual Part Latents for 3D Generation

Shaocong Dong; Lihe Ding; Xiao Chen; Yaokun Li; Yuxin Wang; Yucheng Wang; Qi Wang; Jaehyeok Kim; Chenjian Gao; Zhanpeng Huang; Zibin Wang; Tianfan Xue; Dan Xu

From One to More: Contextual Part Latents for 3D Generation

Shaocong Dong, Lihe Ding, Xiao Chen, Yaokun Li, Yuxin Wang, Yucheng Wang, Qi Wang, Jaehyeok Kim, Chenjian Gao, Zhanpeng Huang, Zibin Wang, Tianfan Xue, Dan Xu

TL;DR

CoPart addresses the challenge of designing high-quality, controllable 3D objects with multiple independent parts by introducing contextual part latents—each part has a geometric token and an image token—learned via synchronized diffusion with mutual guidance between parts and modalities. A global guidance branch and part-level bounding box conditioning provide cross-part coherence and explicit local controllability, while PartVerse supplies a large, semi-automated dataset of 91k parts from 12k objects to enable scalable training. The framework supports part-based editing, articulated generation, and mini-scene composition, achieving superior detail in small parts and improved generalization over holistic 3D generators. Overall, CoPart advances controllable, part-aware 3D generation with a scalable dataset and a diffusion-based, cross-part planning paradigm that aligns geometry and appearance while enabling precise part-level control.

Abstract

Recent advances in 3D generation have transitioned from multi-view 2D rendering approaches to 3D-native latent diffusion frameworks that exploit geometric priors in ground truth data. Despite progress, three key limitations persist: (1) Single-latent representations fail to capture complex multi-part geometries, causing detail degradation; (2) Holistic latent coding neglects part independence and interrelationships critical for compositional design; (3) Global conditioning mechanisms lack fine-grained controllability. Inspired by human 3D design workflows, we propose CoPart - a part-aware diffusion framework that decomposes 3D objects into contextual part latents for coherent multi-part generation. This paradigm offers three advantages: i) Reduces encoding complexity through part decomposition; ii) Enables explicit part relationship modeling; iii) Supports part-level conditioning. We further develop a mutual guidance strategy to fine-tune pre-trained diffusion models for joint part latent denoising, ensuring both geometric coherence and foundation model priors. To enable large-scale training, we construct Partverse - a novel 3D part dataset derived from Objaverse through automated mesh segmentation and human-verified annotations. Extensive experiments demonstrate CoPart's superior capabilities in part-level editing, articulated object generation, and scene composition with unprecedented controllability.

From One to More: Contextual Part Latents for 3D Generation

TL;DR

Abstract

From One to More: Contextual Part Latents for 3D Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)