Table of Contents
Fetching ...

SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling

Elisabetta Fedele, Francis Engelmann, Ian Huang, Or Litany, Marc Pollefeys, Leonidas Guibas

TL;DR

SpaceControl introduces the first training-free approach to directly embed 3D spatial constraints into a pre-trained diffusion-based generator, enabling geometry-aware 3D asset creation from coarse primitives to detailed meshes. It uses a two-stage Trellis-based pipeline (structure then appearance) guided by voxelized geometry and optional text/image prompts, with a tunable fidelity-realism tradeoff controlled by tau0. Extensive quantitative benchmarks and a user study show superior geometric faithfulness compared to training-based and guidance-based baselines while preserving high visual realism, complemented by an interactive UI for online editing of superquadrics and texture-ready assets. This work unlocks practical, geometry-centric workflows for artists and developers, allowing rapid, controllable 3D content creation without additional model training.

Abstract

Generative methods for 3D assets have recently achieved remarkable progress, yet providing intuitive and precise control over the object geometry remains a key challenge. Existing approaches predominantly rely on text or image prompts, which often fall short in geometric specificity: language can be ambiguous, and images are cumbersome to edit. In this work, we introduce SpaceControl, a training-free test-time method for explicit spatial control of 3D generation. Our approach accepts a wide range of geometric inputs, from coarse primitives to detailed meshes, and integrates seamlessly with modern pre-trained generative models without requiring any additional training. A controllable parameter lets users trade off between geometric fidelity and output realism. Extensive quantitative evaluation and user studies demonstrate that SpaceControl outperforms both training-based and optimization-based baselines in geometric faithfulness while preserving high visual quality. Finally, we present an interactive user interface that enables online editing of superquadrics for direct conversion into textured 3D assets, facilitating practical deployment in creative workflows. Find our project page at https://spacecontrol3d.github.io/

SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling

TL;DR

SpaceControl introduces the first training-free approach to directly embed 3D spatial constraints into a pre-trained diffusion-based generator, enabling geometry-aware 3D asset creation from coarse primitives to detailed meshes. It uses a two-stage Trellis-based pipeline (structure then appearance) guided by voxelized geometry and optional text/image prompts, with a tunable fidelity-realism tradeoff controlled by tau0. Extensive quantitative benchmarks and a user study show superior geometric faithfulness compared to training-based and guidance-based baselines while preserving high visual realism, complemented by an interactive UI for online editing of superquadrics and texture-ready assets. This work unlocks practical, geometry-centric workflows for artists and developers, allowing rapid, controllable 3D content creation without additional model training.

Abstract

Generative methods for 3D assets have recently achieved remarkable progress, yet providing intuitive and precise control over the object geometry remains a key challenge. Existing approaches predominantly rely on text or image prompts, which often fall short in geometric specificity: language can be ambiguous, and images are cumbersome to edit. In this work, we introduce SpaceControl, a training-free test-time method for explicit spatial control of 3D generation. Our approach accepts a wide range of geometric inputs, from coarse primitives to detailed meshes, and integrates seamlessly with modern pre-trained generative models without requiring any additional training. A controllable parameter lets users trade off between geometric fidelity and output realism. Extensive quantitative evaluation and user studies demonstrate that SpaceControl outperforms both training-based and optimization-based baselines in geometric faithfulness while preserving high visual quality. Finally, we present an interactive user interface that enables online editing of superquadrics for direct conversion into textured 3D assets, facilitating practical deployment in creative workflows. Find our project page at https://spacecontrol3d.github.io/

Paper Structure

This paper contains 40 sections, 5 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: SpaceControl enables spatially controlled 3D asset generation from simple geometric primitives such as superquadrics(light blue) and other geometry types such as polygon meshes. Top: rapid asset generation. From quick 3D sketches and brief text prompts, we can generate high quality assets. Bottom: fine-grained editing, including adjusting a chair’s backrest and adding armrests (left) or precisely controlling a sofa’s dimensions and pillow arrangements (right).
  • Figure 2: Model Overview. Given an input conditioning which includes a spatial control, a text prompt and an image (optional), SpaceControl produces realistic 3D assets. First the different conditioning are encoded in a latent space. Specifically, the spatial control is voxelized and encoded by Trellis' encoder $\mathcal{E}$, the text is encoded by a CLIP encoder $\mathcal{E}_{CLIP}$, and the image (if present) is encoded by a DINOv2 encoder $\mathcal{E}_{DINO}$. The obtained latents $\mathbf{z}_{0, c}$ are noised up to $t_0$ to obtain $\mathbf{z}_{t_0}$. From $t_0$ to $t=0$, $\mathbf{z}_{t_0}$ are denoised by the Structure Flow Model (FM), guided by the text prompt features. The clean latents $\mathbf{z}_{0}$ are then fed into the decoder $\mathcal{D}$, which outputs the voxel grid $\mathbf{x}_0$. Then, the active voxels are augmented with point-wise noisy latent features, denoised by the Appearance Flow Model (FM), using either text or image conditioning. The clean latents can then be decoded into versatile output formats such as 3D gaussians (GS), radiance fields (RF), and meshes (M) via specific decoders $\mathcal{D}_{O}=\{\mathcal{D}_{GS}, \mathcal{D}_{RF}, \mathcal{D}_M\}$.
  • Figure 3: Realism-faithfulness tradeoff. The hyperparameter $\tau_0$ allows a smooth control over the strength of the control. In the left figure we show how variations of $\tau_0$ affects the generations quantitatively in terms of Chamfer distance to the spatial control (lower means more faithful) and of FID score (lower means more realistic). In the right figure we show it qualitatively, visualizing how higher values of $\tau_0$ lead to assets whose geometry looks even more similar to the control. For conciseness we only show the untextured geometry.
  • Figure 4: Qualitative Comparison of Spatially Conditioned Generation. We show generations obtained conditioning our SpaceControl and baselines on text prompts and superquadrics from the Toys4K dataset. While other methods either fail to follow the conditioning (e.g., the antenna from the radio generated by Spice-E is wrongly placed) or to generate visually appealing 3D assets (e.g., the chicken generated by Spice-E-T exhibits anatomically incorrect body part placements), SpaceControl exhibits a good balance between realism and faithfulness.
  • Figure 5: User Study Results. The bar plots present the proportion of favorable comparisons achieved by our SpaceControl against the baselines on overall appearance, faithfulness to spatial control, and realism, respectively.
  • ...and 9 more figures