LogicQA: Logical Anomaly Detection with Vision Language Model Generated Questions
Yejin Kwon, Daeun Moon, Youngje Oh, Hyunsoo Yoon
TL;DR
This work tackles logical anomaly detection in industrial settings by introducing LogicQA, a training-free, few-shot framework that uses a pre-trained Vision-Language Model to automatically generate anomaly-relevant questions and provide natural-language explanations. By describing normal images, summarizing normal context, generating main questions, and testing with semantically varied sub-questions, LogicQA enables interpretable anomaly detection without task-specific training or annotations. It achieves state-of-the-art performance on MVTec LOCO AD and strong results on a real-world Semiconductor SEM dataset, while demonstrating robustness across different VLM backbones. The practical significance lies in scalable, explainable industrial AD with minimal data requirements, broad applicability across classes, and compatibility with multiple VLMs.
Abstract
Anomaly Detection (AD) focuses on detecting samples that differ from the standard pattern, making it a vital tool in process control. Logical anomalies may appear visually normal yet violate predefined constraints on object presence, arrangement, or quantity, depending on reasoning and explainability. We introduce LogicQA, a framework that enhances AD by providing industrial operators with explanations for logical anomalies. LogicQA compiles automatically generated questions into a checklist and collects responses to identify violations of logical constraints. LogicQA is training-free, annotation-free, and operates in a few-shot setting. We achieve state-of-the-art (SOTA) Logical AD performance on public benchmarks, MVTec LOCO AD, with an AUROC of 87.6 percent and an F1-max of 87.0 percent along with the explanations of anomalies. Also, our approach has shown outstanding performance on semiconductor SEM corporate data, further validating its effectiveness in industrial applications.
