Table of Contents
Fetching ...

KKA: Improving Vision Anomaly Detection through Anomaly-related Knowledge from Large Language Models

Dong Chen, Zhengqing Hu, Peiguang Fan, Yueting Zhuang, Yafei Li, Qidong Liu, Xiaoheng Jiang, Mingliang Xu

TL;DR

This work addresses the challenge of unsupervised vision anomaly detection by introducing Key Knowledge Augmentation (KKA), which extracts anomaly-related knowledge from large language models to generate plausible, boundary-shaping anomalies tailored to normal samples. By classifying generated anomalies into easy and hard via a confusion evaluator and iteratively enriching hard anomalies through LLM fine-tuning with Direct Preference Optimization, KKA guides detectors to learn a clearer boundary between normal and anomalous data with modest sample generation overhead. Across CIFAR-100, Oxford-102, and UCM-Caption, KKA consistently enhances AUC for multiple detectors, notably elevating SimpleNet AUC on CIFAR-100 from 74.62% to 84.04%. The method demonstrates strong generality, yielding improvements even when integrated with different anomaly detectors and settings, and offers practical benefits through reduced reliance on purely random anomaly generation.

Abstract

Vision anomaly detection, particularly in unsupervised settings, often struggles to distinguish between normal samples and anomalies due to the wide variability in anomalies. Recently, an increasing number of studies have focused on generating anomalies to help detectors learn more effective boundaries between normal samples and anomalies. However, as the generated anomalies are often derived from random factors, they frequently lack realism. Additionally, randomly generated anomalies typically offer limited support in constructing effective boundaries, as most differ substantially from normal samples and lie far from the boundary. To address these challenges, we propose Key Knowledge Augmentation (KKA), a method that extracts anomaly-related knowledge from large language models (LLMs). More specifically, KKA leverages the extensive prior knowledge of LLMs to generate meaningful anomalies based on normal samples. Then, KKA classifies the generated anomalies as easy anomalies and hard anomalies according to their similarity to normal samples. Easy anomalies exhibit significant differences from normal samples, whereas hard anomalies closely resemble normal samples. KKA iteratively updates the generated anomalies, and gradually increasing the proportion of hard anomalies to enable the detector to learn a more effective boundary. Experimental results show that the proposed method significantly improves the performance of various vision anomaly detectors while maintaining low generation costs. The code for CMG can be found at https://github.com/Anfeather/KKA.

KKA: Improving Vision Anomaly Detection through Anomaly-related Knowledge from Large Language Models

TL;DR

This work addresses the challenge of unsupervised vision anomaly detection by introducing Key Knowledge Augmentation (KKA), which extracts anomaly-related knowledge from large language models to generate plausible, boundary-shaping anomalies tailored to normal samples. By classifying generated anomalies into easy and hard via a confusion evaluator and iteratively enriching hard anomalies through LLM fine-tuning with Direct Preference Optimization, KKA guides detectors to learn a clearer boundary between normal and anomalous data with modest sample generation overhead. Across CIFAR-100, Oxford-102, and UCM-Caption, KKA consistently enhances AUC for multiple detectors, notably elevating SimpleNet AUC on CIFAR-100 from 74.62% to 84.04%. The method demonstrates strong generality, yielding improvements even when integrated with different anomaly detectors and settings, and offers practical benefits through reduced reliance on purely random anomaly generation.

Abstract

Vision anomaly detection, particularly in unsupervised settings, often struggles to distinguish between normal samples and anomalies due to the wide variability in anomalies. Recently, an increasing number of studies have focused on generating anomalies to help detectors learn more effective boundaries between normal samples and anomalies. However, as the generated anomalies are often derived from random factors, they frequently lack realism. Additionally, randomly generated anomalies typically offer limited support in constructing effective boundaries, as most differ substantially from normal samples and lie far from the boundary. To address these challenges, we propose Key Knowledge Augmentation (KKA), a method that extracts anomaly-related knowledge from large language models (LLMs). More specifically, KKA leverages the extensive prior knowledge of LLMs to generate meaningful anomalies based on normal samples. Then, KKA classifies the generated anomalies as easy anomalies and hard anomalies according to their similarity to normal samples. Easy anomalies exhibit significant differences from normal samples, whereas hard anomalies closely resemble normal samples. KKA iteratively updates the generated anomalies, and gradually increasing the proportion of hard anomalies to enable the detector to learn a more effective boundary. Experimental results show that the proposed method significantly improves the performance of various vision anomaly detectors while maintaining low generation costs. The code for CMG can be found at https://github.com/Anfeather/KKA.

Paper Structure

This paper contains 15 sections, 8 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Unsupervised anomaly detection and anomaly detection utilizing anomaly knowledge derived from LLMs.
  • Figure 2: The overview of KKA. The purple dashed box indicates training the detector using only normal samples, while the blue dashed box indicates training the detector with both normal and generated anomalies.
  • Figure 3: The anomalies generated by KKA on Oxford-102 dataset and the corresponding iterative process.
  • Figure 4: Different quantities of initially generated anomalies and their corresponding average accuracy on Oxford-102.
  • Figure 5: Visualization of different methods. The dimensionality-reduction algorithm for visualizing high-dimensional samples is t-SNE van2008visualizing.
  • ...and 1 more figures