CLIPtortionist: Zero-shot Text-driven Deformation for Manufactured 3D Shapes

Xianghao Xu; Srinath Sridhar; Daniel Ritchie

CLIPtortionist: Zero-shot Text-driven Deformation for Manufactured 3D Shapes

Xianghao Xu, Srinath Sridhar, Daniel Ritchie

TL;DR

A zero-shot text-driven 3D shape deformation system that deforms an input 3D mesh of a manufactured object to fit an input text description to maximize an objective function based on the widely used pre-trained vision language model CLIP.

Abstract

We propose a zero-shot text-driven 3D shape deformation system that deforms an input 3D mesh of a manufactured object to fit an input text description. To do this, our system optimizes the parameters of a deformation model to maximize an objective function based on the widely used pre-trained vision language model CLIP. We find that CLIP-based objective functions exhibit many spurious local optima; to circumvent them, we parameterize deformations using a novel deformation model called BoxDefGraph which our system automatically computes from an input mesh, the BoxDefGraph is designed to capture the object aligned rectangular/circular geometry features of most manufactured objects. We then use the CMA-ES global optimization algorithm to maximize our objective, which we find to work better than popular gradient-based optimizers. We demonstrate that our approach produces appealing results and outperforms several baselines.

CLIPtortionist: Zero-shot Text-driven Deformation for Manufactured 3D Shapes

TL;DR

Abstract

CLIPtortionist: Zero-shot Text-driven Deformation for Manufactured 3D Shapes

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)