Table of Contents
Fetching ...

MeshUp: Multi-Target Mesh Deformation via Blended Score Distillation

Hyunwoo Kim, Itai Lang, Noam Aigerman, Thibault Groueix, Vladimir G. Kim, Rana Hanocka

TL;DR

MeshUp addresses the problem of deforming a 3D mesh toward multiple target concepts with region-specific control. It introduces Blended Score Distillation (BSD), which fuses activation maps from parallel diffusion branches for each target into a single denoising pathway, guided by Score Distillation Sampling (SDS). Localization is achieved by constructing a 3D Region of Interest on the mesh via self-attention maps and 3D ROI masks that constrain where each concept expresses, with Jacobian-based deformation $J_i$ optimized to realize the mix. The method supports text, image, and mesh inputs, allows arbitrary numbers of targets, and enables precise regional blending, demonstrated through multi-target, localized, and texture-transfer results, along with ablations and user studies. Overall, MeshUp provides a scalable, diffusion-guided framework for high-fidelity, semantically controlled mesh deformation with practical implications for creative 3D modeling.

Abstract

We propose MeshUp, a technique that deforms a 3D mesh towards multiple target concepts, and intuitively controls the region where each concept is expressed. Conveniently, the concepts can be defined as either text queries, e.g., "a dog" and "a turtle," or inspirational images, and the local regions can be selected as any number of vertices on the mesh. We can effectively control the influence of the concepts and mix them together using a novel score distillation approach, referred to as the Blended Score Distillation (BSD). BSD operates on each attention layer of the denoising U-Net of a diffusion model as it extracts and injects the per-objective activations into a unified denoising pipeline from which the deformation gradients are calculated. To localize the expression of these activations, we create a probabilistic Region of Interest (ROI) map on the surface of the mesh, and turn it into 3D-consistent masks that we use to control the expression of these activations. We demonstrate the effectiveness of BSD empirically and show that it can deform various meshes towards multiple objectives. Our project page is at https://threedle.github.io/MeshUp.

MeshUp: Multi-Target Mesh Deformation via Blended Score Distillation

TL;DR

MeshUp addresses the problem of deforming a 3D mesh toward multiple target concepts with region-specific control. It introduces Blended Score Distillation (BSD), which fuses activation maps from parallel diffusion branches for each target into a single denoising pathway, guided by Score Distillation Sampling (SDS). Localization is achieved by constructing a 3D Region of Interest on the mesh via self-attention maps and 3D ROI masks that constrain where each concept expresses, with Jacobian-based deformation optimized to realize the mix. The method supports text, image, and mesh inputs, allows arbitrary numbers of targets, and enables precise regional blending, demonstrated through multi-target, localized, and texture-transfer results, along with ablations and user studies. Overall, MeshUp provides a scalable, diffusion-guided framework for high-fidelity, semantically controlled mesh deformation with practical implications for creative 3D modeling.

Abstract

We propose MeshUp, a technique that deforms a 3D mesh towards multiple target concepts, and intuitively controls the region where each concept is expressed. Conveniently, the concepts can be defined as either text queries, e.g., "a dog" and "a turtle," or inspirational images, and the local regions can be selected as any number of vertices on the mesh. We can effectively control the influence of the concepts and mix them together using a novel score distillation approach, referred to as the Blended Score Distillation (BSD). BSD operates on each attention layer of the denoising U-Net of a diffusion model as it extracts and injects the per-objective activations into a unified denoising pipeline from which the deformation gradients are calculated. To localize the expression of these activations, we create a probabilistic Region of Interest (ROI) map on the surface of the mesh, and turn it into 3D-consistent masks that we use to control the expression of these activations. We demonstrate the effectiveness of BSD empirically and show that it can deform various meshes towards multiple objectives. Our project page is at https://threedle.github.io/MeshUp.
Paper Structure (9 sections, 20 equations, 24 figures, 3 tables)

This paper contains 9 sections, 20 equations, 24 figures, 3 tables.

Figures (24)

  • Figure 1: MeshUp is capable of deforming a source mesh into various concepts and into their weighted blends. The target objectives can be text prompts, images, or even mesh. Users can also input a set of control vertices to explicitly define where on the mesh the particular concepts should be expressed (Figure \ref{['fig:local_def1']}). The colors on the mesh visualize the point-wise correspondence between the source and the deformed mesh.
  • Figure 2: Overview of Concept Blending. MeshUp takes as input a 3D mesh and several target objectives, such as the text "Sea turtle" and "Bulldog." We deform the source mesh by optimizing the per-triangle Jacobians of the mesh. At each iteration, we render the mesh and apply the same random noise for each target objective. Then, we pass the noised renderings and the text input through the U-Net of a pretrained text-to-image model and store the activations associated with each objective (the Target Branch). In the Blending Branch, we feed the noised rendering of the mesh to the U-Net, but condition it on the null-text embedding. We blend and inject the activations stored at each target branch into the blending branch. The gradients from the blending branch are then backpropagated via Score Distillation Sampling (SDS). After running this process iteratively, the mesh is deformed into a blend of "Sea turtle" and "Bulldog."
  • Figure 3: Overview of Blended Score Distillation (BSD). For each attention layer in the denoising U-Net, we inject the activation maps from Target Branch1 and Target Branch2 to the Blending Branch (top), blending the feature representations for each target. Optional Localization Mask* (bottom) indicates the additional mask that we optionally apply over the cross-attention maps for localization control. The mask identifies local regions described by the selected control vertices and different weights are assigned to each of these regions. For more details, please see Figure \ref{['fig:system_local']} and Localization Control part of Section \ref{['sec:method']}.
  • Figure 4: Results Gallery. We present a diverse set of 1-way, 2-way, and 3-way blending results of MeshUp. MeshUp can operate on various kinds of source shapes like human body, face, or animals, and can deform them into a blend of multiple concepts.
  • Figure 5: Interpolation Between Two Objectives. We show that we can vary the ratio between two objectives (e.g. going from hippo 100% on the left to Hippo 70%-Dachshund 30%, Hippo 30%-Dachshund 70% and finally Dachshund 100% on the right), effectively interpolating between the shape of the two targets.
  • ...and 19 more figures