Table of Contents
Fetching ...

PartStickers: Generating Parts of Objects for Rapid Prototyping

Mo Zhou, Josh Myers-Dean, Danna Gurari

TL;DR

PartStickers tackles the challenge of generating isolated object parts on neutral backgrounds from text prompts using a diffusion-based approach. It trains a model via LoRA fine-tuning on a dataset of part stickers created from part segmentation masks, employing a two-stage pipeline that pastes masked parts onto a gray canvas and uses prompts of the form $\text{a [PART] of a [OBJECT]}$. The diffusion process operates over $T$ timesteps with a variance schedule $\{\beta_t\}_{t=1}^T$ and optimizes the denoising loss $L_{denoise}$ to produce realistic, part-focused outputs that align with text prompts. Empirically, PartStickers achieves superior realism and text-part alignment compared with baselines on PartImageNet-derived data, demonstrating reliable part isolation, neutral backgrounds, and center-focused generation suitable for rapid prototyping, with implications for streamlined design workflows and remix-driven creativity.

Abstract

Design prototyping involves creating mockups of products or concepts to gather feedback and iterate on ideas. While prototyping often requires specific parts of objects, such as when constructing a novel creature for a video game, existing text-to-image methods tend to only generate entire objects. To address this, we propose a novel task and method of ``part sticker generation", which entails generating an isolated part of an object on a neutral background. Experiments demonstrate our method outperforms state-of-the-art baselines with respect to realism and text alignment, while preserving object-level generation capabilities. We publicly share our code and models to encourage community-wide progress on this new task: https://partsticker.github.io.

PartStickers: Generating Parts of Objects for Rapid Prototyping

TL;DR

PartStickers tackles the challenge of generating isolated object parts on neutral backgrounds from text prompts using a diffusion-based approach. It trains a model via LoRA fine-tuning on a dataset of part stickers created from part segmentation masks, employing a two-stage pipeline that pastes masked parts onto a gray canvas and uses prompts of the form . The diffusion process operates over timesteps with a variance schedule and optimizes the denoising loss to produce realistic, part-focused outputs that align with text prompts. Empirically, PartStickers achieves superior realism and text-part alignment compared with baselines on PartImageNet-derived data, demonstrating reliable part isolation, neutral backgrounds, and center-focused generation suitable for rapid prototyping, with implications for streamlined design workflows and remix-driven creativity.

Abstract

Design prototyping involves creating mockups of products or concepts to gather feedback and iterate on ideas. While prototyping often requires specific parts of objects, such as when constructing a novel creature for a video game, existing text-to-image methods tend to only generate entire objects. To address this, we propose a novel task and method of ``part sticker generation", which entails generating an isolated part of an object on a neutral background. Experiments demonstrate our method outperforms state-of-the-art baselines with respect to realism and text alignment, while preserving object-level generation capabilities. We publicly share our code and models to encourage community-wide progress on this new task: https://partsticker.github.io.

Paper Structure

This paper contains 31 sections, 3 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Given text prompts on the left of parts of objects, we show results from (a) a baseline, Stable Diffusion (SD) 1.5, showing that: generated parts can 1) be unrealistic (top), 2) include unwanted information, such as obfuscated content (middle) and 3) show the entire object, rather than just the part of interest (bottom). In contrast, results from (b) our method consistently produce realistic parts on neutral backgrounds, isolated from their parent objects.
  • Figure 2: Overview of our proposed PartStickers framework. We train a base model on 'part stickers' (i.e., masked out parts of an object pasted on a neutral background) with text prompts describing the region, both of which are derived from existing part segmentation datasets. Text prompts are created by combining the part class labels with their object-level superclasses, leveraging the following template: "a [PART] of a [OBJECT]". We leverage LoRA hu2022lora to achieve parameter-efficient fine-tuning.
  • Figure 3: Qualitative results showing examples of generated images given text prompts (left) and the average image of 100 generated samples from a given method (left). Overall, we observe that PartStickers is the only method capable of consistently generating only the requested part on a neutral background with a high degree of realism. The bottom three rows represent out-of-distribution scenarios for PartStickers: generation of an object and two out-of-distribution parts. (SD stands for Stable Diffusion).