Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection

Tianwu Lei; Silin Chen; Bohan Wang; Zhengkai Jiang; Ningmu Zou

Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection

Tianwu Lei, Silin Chen, Bohan Wang, Zhengkai Jiang, Ningmu Zou

TL;DR

An Adapted-MoE which contains a routing network and a series of expert models to handle multiple distributions of same-category samples by divide and conquer and the test-time adaption to eliminate the bias between the unseen test sample representation and the feature distribution learned by the expert model is proposed.

Abstract

Most unsupervised anomaly detection methods based on representations of normal samples to distinguish anomalies have recently made remarkable progress. However, existing methods only learn a single decision boundary for distinguishing the samples within the training dataset, neglecting the variation in feature distribution for normal samples even in the same category in the real world. Furthermore, it was not considered that a distribution bias still exists between the test set and the train set. Therefore, we propose an Adapted-MoE which contains a routing network and a series of expert models to handle multiple distributions of same-category samples by divide and conquer. Specifically, we propose a routing network based on representation learning to route same-category samples into the subclasses feature space. Then, a series of expert models are utilized to learn the representation of various normal samples and construct several independent decision boundaries. We propose the test-time adaption to eliminate the bias between the unseen test sample representation and the feature distribution learned by the expert model. Our experiments are conducted on a dataset that provides multiple subclasses from three categories, namely Texture AD benchmark. The Adapted-MoE significantly improves the performance of the baseline model, achieving 2.18%-7.20% and 1.57%-16.30% increase in I-AUROC and P-AUROC, which outperforms the current state-of-the-art methods. Our code is available at https://github.com/.

Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection

TL;DR

Abstract

Paper Structure (13 sections, 6 equations, 6 figures, 4 tables)

This paper contains 13 sections, 6 equations, 6 figures, 4 tables.

Introduction
Related Works
Method
Mixture of Experts
Normalization.
Test-Time Adaption
Experiments
Datasets and Metrics
Implementation Details
Comparisons with State-Of-The-Arts
Ablation Studies
Conclusion
Limitation.

Figures (6)

Figure 1: Existing methods construct single decision boundary by learning representations of normal samples, ignoring variations in the feature distribution of samples within the same category as shown in the Texture AD-Cloth texturead. Moreover, the test dataset still has a massive distribution of unseen samples. Existing datasets (e.g., MVTec AD dataset bergmann2019mvtec) in which similar samples are all in the same distribution are illustrated by Sample-$a$, Sample-$b$, and Sample-$c$.
Figure 2: Overview of Adapted-MoE. First a frozen backbone is employed to conduct feature extraction on the samples. Subsequently, the extracted feature embeddings are divided into different expert models for training through a routing network, where the training loss consists of the routing loss $L_{routing}$ and the loss of the expert model $L_{expert}$. In the testing phase, Test-Time Adaption calibrates the routed features to eliminate distribution bias before anomaly detection.
Figure 3: Mixture of Experts. For a mini-batch of feature embeddings, the center loss is utilized in the routing network to divide them into different subclasses during the training process. Simple expert models construct multiple decision boundaries in independent feature spaces for different subclasses.
Figure 4: Test-Time Adaptation. Since the test samples do not appear in the training phase, the distribution of the samples at the testing has bias with the distribution of the samples learned by the expert model. We eliminate the distance between the two distributions by Test-Time Adaptation, to unify the position of the decision boundary.
Figure 5: Ablation experiments for subclasses on cloth dataset. For the I-AUROC metric, our method improves on some unseen subclasses by $3.05\%$-$25.78\%$. For the P-AUROC metric, our method improves on all unseen subclasses by $5.82\%$-$25.48\%$.
...and 1 more figures

Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection

TL;DR

Abstract

Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)