Table of Contents
Fetching ...

SPLICE: Part-Level 3D Shape Editing from Local Semantic Extraction to Global Neural Mixing

Jin Zhou, Hongliang Yang, Pengfei Xu, Hui Huang

TL;DR

SPLICE introduces a part-level neural implicit representation that decouples each shape part's geometry and pose, enabling direct, semantically meaningful edits. By encoding pose with six ellipsoid-vertex endpoints and applying an attention-guided Transformer decoder, the method achieves coherent global reconstructions with reduced inter-part leakage. An optional diffusion-based refinement further ensures robustness and completion during extreme or global edits. Across PartNet and ShapeNet datasets, SPLICE demonstrates superior editing flexibility, reconstruction fidelity, and resilience to sequential edits, outperforming leading baselines. This approach offers a practical, modular pathway for editable 3D designs with strong semantic coherence.

Abstract

Neural implicit representations of 3D shapes have shown great potential in 3D shape editing due to their ability to model high-level semantics and continuous geometric representations. However, existing methods often suffer from limited editability, lack of part-level control, and unnatural results when modifying or rearranging shape parts. In this work, we present SPLICE, a novel part-level neural implicit representation of 3D shapes that enables intuitive, structure-aware, and high-fidelity shape editing. By encoding each shape part independently and positioning them using parameterized Gaussian ellipsoids, SPLICE effectively isolates part-specific features while discarding global context that may hinder flexible manipulation. A global attention-based decoder is then employed to integrate parts coherently, further enhanced by an attention-guiding filtering mechanism that prevents information leakage across symmetric or adjacent components. Through this architecture, SPLICE supports various part-level editing operations, including translation, rotation, scaling, deletion, duplication, and cross-shape part mixing. These operations enable users to flexibly explore design variations while preserving semantic consistency and maintaining structural plausibility. Extensive experiments demonstrate that SPLICE outperforms existing approaches both qualitatively and quantitatively across a diverse set of shape-editing tasks.

SPLICE: Part-Level 3D Shape Editing from Local Semantic Extraction to Global Neural Mixing

TL;DR

SPLICE introduces a part-level neural implicit representation that decouples each shape part's geometry and pose, enabling direct, semantically meaningful edits. By encoding pose with six ellipsoid-vertex endpoints and applying an attention-guided Transformer decoder, the method achieves coherent global reconstructions with reduced inter-part leakage. An optional diffusion-based refinement further ensures robustness and completion during extreme or global edits. Across PartNet and ShapeNet datasets, SPLICE demonstrates superior editing flexibility, reconstruction fidelity, and resilience to sequential edits, outperforming leading baselines. This approach offers a practical, modular pathway for editable 3D designs with strong semantic coherence.

Abstract

Neural implicit representations of 3D shapes have shown great potential in 3D shape editing due to their ability to model high-level semantics and continuous geometric representations. However, existing methods often suffer from limited editability, lack of part-level control, and unnatural results when modifying or rearranging shape parts. In this work, we present SPLICE, a novel part-level neural implicit representation of 3D shapes that enables intuitive, structure-aware, and high-fidelity shape editing. By encoding each shape part independently and positioning them using parameterized Gaussian ellipsoids, SPLICE effectively isolates part-specific features while discarding global context that may hinder flexible manipulation. A global attention-based decoder is then employed to integrate parts coherently, further enhanced by an attention-guiding filtering mechanism that prevents information leakage across symmetric or adjacent components. Through this architecture, SPLICE supports various part-level editing operations, including translation, rotation, scaling, deletion, duplication, and cross-shape part mixing. These operations enable users to flexibly explore design variations while preserving semantic consistency and maintaining structural plausibility. Extensive experiments demonstrate that SPLICE outperforms existing approaches both qualitatively and quantitatively across a diverse set of shape-editing tasks.

Paper Structure

This paper contains 27 sections, 16 equations, 17 figures, 2 tables.

Figures (17)

  • Figure 1: Part-level editing results produced by SPLICE. Our method supports a wide range of intuitive editing operations, including sequential edits, copy, move, delete, rotate, scale, and mix, without requiring manual post-adjustment. The edited shapes remain structurally coherent and visually plausible. Additionally, SPLICE exhibits strong robustness under multi-step editing, consistently maintaining high reconstruction quality throughout the editing process.
  • Figure 2: Overview of our SPLICE pipeline. Given a 3D shape decomposed into parts, we first apply Part Feature Extraction using a shared convolutional encoder $f_{\mathrm{enc}}$ to obtain per-part geometry latent codes $\{\mathbf{z}_i\}$ and Gaussian proxies $\{\mathbf{g}_i\}$. In User Editing and Diffusion-based Refinement, these proxies can be directly modified by user operations (e.g., move, scale, mix) or adjusted by a latent diffusion model $f_{\mathrm{adj}}$ to restore global coherence. The resulting updated proxies $\{(\mathbf{z}'_i, \mathbf{g}'_i)\}$ are then processed in Pose Feature Mixing, where each $\mathbf{g}'_i$ is encoded by a SIREN-based pose encoder $\phi$, and combined with $\mathbf{z}'_i$ via a multilayer perceptron $f_{\mathrm{MLP}}$ to obtain the final part embedding $\mathbf{h}_i$. Finally, in Attention-based Shape Decoding, sampled query points attend to $\{\mathbf{h}_i\}$ through a cross-attention transformer $f_{\mathrm{dec}}$, followed by occupancy decoding to reconstruct the final shape via marching cubes.
  • Figure 3: Comparison of the copy editing results between our method and SPAGHETTI.
  • Figure 4: Comparison of delete editing results between our method and SPAGHETTI. The gray parts are deleted during the editing.
  • Figure 5: Comparison of move editing results between our method and SPAGHETTI and DualSDF.
  • ...and 12 more figures