AnoStyler: Text-Driven Localized Anomaly Generation via Lightweight Style Transfer

Yulim So; Seokho Kang

AnoStyler: Text-Driven Localized Anomaly Generation via Lightweight Style Transfer

Yulim So, Seokho Kang

TL;DR

This work tackles the scarcity and realism gap of anomaly data by introducing AnoStyler, a zero-shot approach that reframes anomaly generation as text-guided style transfer on a single normal image. It introduces a lightweight pipeline with shape-guided mask generation (Meta-Shape Priors), two-class prompt design, and text-driven stylization using a compact U-Net guided by frozen CLIP encoders, optimized with mask-aware losses. The method achieves state-of-the-art zero-shot anomaly generation and downstream anomaly detection on MVTec-AD and VisA, while maintaining significantly lower computational cost than diffusion-based baselines. Practically, AnoStyler offers a scalable, data-efficient path to synthesize realistic, semantically grounded anomalies for robust industrial defect detection without requiring large labeled anomaly sets.

Abstract

Anomaly generation has been widely explored to address the scarcity of anomaly images in real-world data. However, existing methods typically suffer from at least one of the following limitations, hindering their practical deployment: (1) lack of visual realism in generated anomalies; (2) dependence on large amounts of real images; and (3) use of memory-intensive, heavyweight model architectures. To overcome these limitations, we propose AnoStyler, a lightweight yet effective method that frames zero-shot anomaly generation as text-guided style transfer. Given a single normal image along with its category label and expected defect type, an anomaly mask indicating the localized anomaly regions and two-class text prompts representing the normal and anomaly states are generated using generalizable category-agnostic procedures. A lightweight U-Net model trained with CLIP-based loss functions is used to stylize the normal image into a visually realistic anomaly image, where anomalies are localized by the anomaly mask and semantically aligned with the text prompts. Extensive experiments on the MVTec-AD and VisA datasets show that AnoStyler outperforms existing anomaly generation methods in generating high-quality and diverse anomaly images. Furthermore, using these generated anomalies helps enhance anomaly detection performance.

AnoStyler: Text-Driven Localized Anomaly Generation via Lightweight Style Transfer

TL;DR

Abstract

AnoStyler: Text-Driven Localized Anomaly Generation via Lightweight Style Transfer

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)