OSTAF: A One-Shot Tuning Method for Improved Attribute-Focused T2I Personalization
Ye Wang, Zili Yi, Rui Ma
TL;DR
The paper identifies the challenge of attributing fine-grained visual characteristics from a single reference image in text-to-image personalization. It proposes OSTAF, a hypernetwork-guided, one-shot fine-tuning framework that modulates attention weights in the U-Net (encoder or decoder) to learn attribute-specific features such as appearance, shape, and style, with a lightweight hypernetwork predicting weight offsets and a controllable intensity parameter $\lambda$. Through an Attribute Benchmark and extensive quantitative and qualitative evaluations, OSTAF demonstrates superior attribute identification and customization quality compared to DreamBooth, Prospect, IP-Adapter, and ControlNet baselines, while maintaining text controllability and reasonable efficiency. The method offers practical impact by enabling precise, efficient attribute-focused personalization from a single image, with potential extensions to faster tuning and video content in future work.
Abstract
Personalized text-to-image (T2I) models not only produce lifelike and varied visuals but also allow users to tailor the images to fit their personal taste. These personalization techniques can grasp the essence of a concept through a collection of images, or adjust a pre-trained text-to-image model with a specific image input for subject-driven or attribute-aware guidance. Yet, accurately capturing the distinct visual attributes of an individual image poses a challenge for these methods. To address this issue, we introduce OSTAF, a novel parameter-efficient one-shot fine-tuning method which only utilizes one reference image for T2I personalization. A novel hypernetwork-powered attribute-focused fine-tuning mechanism is employed to achieve the precise learning of various attribute features (e.g., appearance, shape or drawing style) from the reference image. Comparing to existing image customization methods, our method shows significant superiority in attribute identification and application, as well as achieves a good balance between efficiency and output quality.
