Table of Contents
Fetching ...

ALARM: Automated MLLM-Based Anomaly Detection in Complex-EnviRonment Monitoring with Uncertainty Quantification

Congjing Zhang, Feng Lin, Xinyi Zhao, Pei Guo, Wei Li, Lin Chen, Chaoyue Zhao, Shuai Huang

TL;DR

ALARM addresses uncertainty in MLLM-based visual anomaly detection in complex environments by decomposing the decision process into Data Comprehension, Analytical Thinking, and Reflection, and by learning an optimized uncertainty score S. It integrates multi-LLM ensembles with probabilistic matrix factorization-based UQ and a quality-assurance abstention mechanism to achieve robust performance across smart-home and wound-dataset scenarios. The paper demonstrates that stage-wise UQ signals provide complementary information, enabling effective selective classification and practical human-in-the-loop collaboration. The results show that ALARM outperforms baselines and maintains robust performance under varying rejection rates and model counts, highlighting its generic applicability to diverse domains.

Abstract

The advance of Large Language Models (LLMs) has greatly stimulated research interest in developing multi-modal LLM (MLLM)-based visual anomaly detection (VAD) algorithms that can be deployed in complex environments. The challenge is that in these complex environments, the anomalies are sometimes highly contextual and also ambiguous, and thereby, uncertainty quantification (UQ) is a crucial capacity for an MLLM-based VAD system to succeed. In this paper, we introduce our UQ-supported MLLM-based VAD framework called ALARM. ALARM integrates UQ with quality-assurance techniques like reasoning chain, self-reflection, and MLLM ensemble for robust and accurate performance and is designed based on a rigorous probabilistic inference pipeline and computational process. Extensive empirical evaluations are conducted using the real-world smart-home benchmark data and wound image classification data, which shows ALARM's superior performance and its generic applicability across different domains for reliable decision-making.

ALARM: Automated MLLM-Based Anomaly Detection in Complex-EnviRonment Monitoring with Uncertainty Quantification

TL;DR

ALARM addresses uncertainty in MLLM-based visual anomaly detection in complex environments by decomposing the decision process into Data Comprehension, Analytical Thinking, and Reflection, and by learning an optimized uncertainty score S. It integrates multi-LLM ensembles with probabilistic matrix factorization-based UQ and a quality-assurance abstention mechanism to achieve robust performance across smart-home and wound-dataset scenarios. The paper demonstrates that stage-wise UQ signals provide complementary information, enabling effective selective classification and practical human-in-the-loop collaboration. The results show that ALARM outperforms baselines and maintains robust performance under varying rejection rates and model counts, highlighting its generic applicability to diverse domains.

Abstract

The advance of Large Language Models (LLMs) has greatly stimulated research interest in developing multi-modal LLM (MLLM)-based visual anomaly detection (VAD) algorithms that can be deployed in complex environments. The challenge is that in these complex environments, the anomalies are sometimes highly contextual and also ambiguous, and thereby, uncertainty quantification (UQ) is a crucial capacity for an MLLM-based VAD system to succeed. In this paper, we introduce our UQ-supported MLLM-based VAD framework called ALARM. ALARM integrates UQ with quality-assurance techniques like reasoning chain, self-reflection, and MLLM ensemble for robust and accurate performance and is designed based on a rigorous probabilistic inference pipeline and computational process. Extensive empirical evaluations are conducted using the real-world smart-home benchmark data and wound image classification data, which shows ALARM's superior performance and its generic applicability across different domains for reliable decision-making.

Paper Structure

This paper contains 41 sections, 3 theorems, 34 equations, 17 figures, 2 tables.

Key Result

Lemma 1

For any instance $\boldsymbol{q}$, the value of the loss function $\ell_g(\boldsymbol{q},y)$ is a monotonic function of the uncertainty score $S(\boldsymbol{q})$.

Figures (17)

  • Figure 1: Overview of the ALARM framework.
  • Figure 2: Overview of LLM reasoning chain in ALARM for UQ in VAD.
  • Figure 3: Illustration of the variation analysis to extract $S_{task}$ from the total variability in $\text{Var}[\boldsymbol{z}|\mathcal{T}]$.
  • Figure 4: The ratio of detected misclassification in the rejected cases.
  • Figure 5: Metric trends of different uncertainty scores in ALARM as the rejection rate $P$ varies in smart home.
  • ...and 12 more figures

Theorems & Definitions (9)

  • Definition 1: $S_{data}$
  • Definition 2: $S_{task}$
  • Definition 3: $S_{ref}$
  • Lemma 1: Monotonicity
  • Theorem 1: Effectiveness
  • Theorem 2
  • proof
  • proof
  • proof