A LoRA is Worth a Thousand Pictures
Chenxi Liu, Towaki Takikawa, Alec Jacobson
TL;DR
This work demonstrates that Low Rank Adaptation (LoRA) weights, trained on small art datasets, can function as standalone style embeddings for artist-style analysis. By extracting a compact representation via PCA from concatenated LoRA components and applying a calibration step to mitigate projection drift, the authors show that LoRA-based embeddings yield well-separated style clusters that align with art-historical knowledge and outperform traditional image-based features in clustering and retrieval tasks. The approach enables direct retrieval of LoRA models by style without requiring image generation or access to original training data, offering practical benefits for online LoRA communities and implications for zero-shot fine-tuning and model attribution. The method also reveals which LoRA sub-networks contribute most to style representation, with feedforward components delivering strong performance at reduced dimensionality. Overall, the work provides a new, efficient way to analyze, organize, and attribute customized diffusion models at scale, with potential applications in attribution, compensation, and trustworthy sharing of stylized AI resources.
Abstract
Recent advances in diffusion models and parameter-efficient fine-tuning (PEFT) have made text-to-image generation and customization widely accessible, with Low Rank Adaptation (LoRA) able to replicate an artist's style or subject using minimal data and computation. In this paper, we examine the relationship between LoRA weights and artistic styles, demonstrating that LoRA weights alone can serve as an effective descriptor of style, without the need for additional image generation or knowledge of the original training set. Our findings show that LoRA weights yield better performance in clustering of artistic styles compared to traditional pre-trained features, such as CLIP and DINO, with strong structural similarities between LoRA-based and conventional image-based embeddings observed both qualitatively and quantitatively. We identify various retrieval scenarios for the growing collection of customized models and show that our approach enables more accurate retrieval in real-world settings where knowledge of the training images is unavailable and additional generation is required. We conclude with a discussion on potential future applications, such as zero-shot LoRA fine-tuning and model attribution.
