Sequential PatchCore: Anomaly Detection for Surface Inspection using Synthetic Impurities
Runzhou Mao, Juraj Fulir, Christoph Garth, Petra Gospodnetić
TL;DR
This work targets impurity-induced degradation in automated surface inspection and tackles memory bottlenecks in PatchCore by introducing Sequential PatchCore, which builds and updates the coreset sequentially, reducing memory from $O(NP)$ to $O(|\mathcal{M}|)$ and enabling training on consumer hardware. It also introduces coreset melding to enable transfer learning across datasets, leveraging coresets trained on different versions to form a refined model without reprocessing large datasets. A photorealistic water-stain generation method based on Perlin noise and jittered sampling creates synthetic impurities, enabling impurity-aware pre-training with two dataset variants: Synth and Synth$_{WS}$. Empirical evaluation on a dual aluminum-plate dataset shows that impurities can lower pixel-wise recall, but defect-wise recall remains informative; finetuning on real data substantially improves performance, and coreset melding offers fast, scalable transfer learning. Overall, the paper demonstrates that synthetic impurities, when paired with memory-efficient coresets and transfer learning, enable practical high-resolution anomaly detection with meaningful industrial metrics and insights into impurity effects.
Abstract
The appearance of surface impurities (e.g., water stains, fingerprints, stickers) is an often-mentioned issue that causes degradation of automated visual inspection systems. At the same time, synthetic data generation techniques for visual surface inspection have focused primarily on generating perfect examples and defects, disregarding impurities. This study highlights the importance of considering impurities when generating synthetic data. We introduce a procedural method to include photorealistic water stains in synthetic data. The synthetic datasets are generated to correspond to real datasets and are further used to train an anomaly detection model and investigate the influence of water stains. The high-resolution images used for surface inspection lead to memory bottlenecks during anomaly detection training. To address this, we introduce Sequential PatchCore - a method to build coresets sequentially and make training on large images using consumer-grade hardware tractable. This allows us to perform transfer learning using coresets pre-trained on different dataset versions. Our results show the benefits of using synthetic data for pre-training an explicit coreset anomaly model and the extended performance benefits of finetuning the coreset using real data. We observed how the impurities and labelling ambiguity lower the model performance and have additionally reported the defect-wise recall to provide an industrially relevant perspective on model performance.
