Language-Assisted Feature Transformation for Anomaly Detection
EungGu Yun, Heonjin Ha, Yeongwoo Nam, Bryan Dongik Lee
TL;DR
This work addresses the challenge of defining a flexible normality boundary for anomaly detection under limited or biased data. It introduces Language-Assisted Feature Transformation (LAFT), a training-free approach that leverages the CLIP embedding space to build concept axes from text prompts and project visual features accordingly, enabling targeted or suppressed attributes via $v' = T(v)$ or its orthogonal variant. By combining LAFT with a $k$-NN anomaly scorer (LAFT AD) and integrating LAFT into WinCLIP for industrial AD (WinCLIP+LAFT), the method achieves strong semantic and industrial anomaly detection performance without additional training data. The approach demonstrates robustness to prompt quality, improves detection of anomalies aligned with user knowledge, and offers practical impact in settings where domain knowledge is available but labeled anomalies are scarce. Limitations include heuristic selection of the PCA dimension $d$ and limited localization improvements, suggesting avenues for automatic dimension selection and finer-grained modeling in future work.
Abstract
This paper introduces LAFT, a novel feature transformation method designed to incorporate user knowledge and preferences into anomaly detection using natural language. Accurately modeling the boundary of normality is crucial for distinguishing abnormal data, but this is often challenging due to limited data or the presence of nuisance attributes. While unsupervised methods that rely solely on data without user guidance are common, they may fail to detect anomalies of specific interest. To address this limitation, we propose Language-Assisted Feature Transformation (LAFT), which leverages the shared image-text embedding space of vision-language models to transform visual features according to user-defined requirements. Combined with anomaly detection methods, LAFT effectively aligns visual features with user preferences, allowing anomalies of interest to be detected. Extensive experiments on both toy and real-world datasets validate the effectiveness of our method.
