Table of Contents
Fetching ...

Zero-Shot Anomaly Detection in Battery Thermal Images Using Visual Question Answering with Prior Knowledge

Marcella Astrid, Abdelrahman Shabayek, Djamila Aouada

TL;DR

This work tackles anomaly detection in battery thermal images without training data by leveraging zero-shot Visual Question Answering (VQA) with priors on normal thermal behavior. By encoding normal characteristics into prompts and evaluating three pretrained VQA models, the approach demonstrates competitive performance against SOTA methods that are trained on battery data, highlighting the potential of prompt-driven zero-shot learning for safety-critical monitoring. The study also exposes sensitivity to prompts and trial variability, and suggests preprocessing and occasional normal-context supplementation as practical enhancements. Overall, the findings indicate that VQA-based zero-shot anomaly detection can be a viable, data-efficient alternative for battery safety and efficiency monitoring, with clear avenues for improvement.

Abstract

Batteries are essential for various applications, including electric vehicles and renewable energy storage, making safety and efficiency critical concerns. Anomaly detection in battery thermal images helps identify failures early, but traditional deep learning methods require extensive labeled data, which is difficult to obtain, especially for anomalies due to safety risks and high data collection costs. To overcome this, we explore zero-shot anomaly detection using Visual Question Answering (VQA) models, which leverage pretrained knowledge and textbased prompts to generalize across vision tasks. By incorporating prior knowledge of normal battery thermal behavior, we design prompts to detect anomalies without battery-specific training data. We evaluate three VQA models (ChatGPT-4o, LLaVa-13b, and BLIP-2) analyzing their robustness to prompt variations, repeated trials, and qualitative outputs. Despite the lack of finetuning on battery data, our approach demonstrates competitive performance compared to state-of-the-art models that are trained with the battery data. Our findings highlight the potential of VQA-based zero-shot learning for battery anomaly detection and suggest future directions for improving its effectiveness.

Zero-Shot Anomaly Detection in Battery Thermal Images Using Visual Question Answering with Prior Knowledge

TL;DR

This work tackles anomaly detection in battery thermal images without training data by leveraging zero-shot Visual Question Answering (VQA) with priors on normal thermal behavior. By encoding normal characteristics into prompts and evaluating three pretrained VQA models, the approach demonstrates competitive performance against SOTA methods that are trained on battery data, highlighting the potential of prompt-driven zero-shot learning for safety-critical monitoring. The study also exposes sensitivity to prompts and trial variability, and suggests preprocessing and occasional normal-context supplementation as practical enhancements. Overall, the findings indicate that VQA-based zero-shot anomaly detection can be a viable, data-efficient alternative for battery safety and efficiency monitoring, with clear avenues for improvement.

Abstract

Batteries are essential for various applications, including electric vehicles and renewable energy storage, making safety and efficiency critical concerns. Anomaly detection in battery thermal images helps identify failures early, but traditional deep learning methods require extensive labeled data, which is difficult to obtain, especially for anomalies due to safety risks and high data collection costs. To overcome this, we explore zero-shot anomaly detection using Visual Question Answering (VQA) models, which leverage pretrained knowledge and textbased prompts to generalize across vision tasks. By incorporating prior knowledge of normal battery thermal behavior, we design prompts to detect anomalies without battery-specific training data. We evaluate three VQA models (ChatGPT-4o, LLaVa-13b, and BLIP-2) analyzing their robustness to prompt variations, repeated trials, and qualitative outputs. Despite the lack of finetuning on battery data, our approach demonstrates competitive performance compared to state-of-the-art models that are trained with the battery data. Our findings highlight the potential of VQA-based zero-shot learning for battery anomaly detection and suggest future directions for improving its effectiveness.

Paper Structure

This paper contains 17 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: (a) Our proposed zero-shot anomaly detection method takes a text prompt and a thermal image as input. The text prompt includes information about the type of image provided (introduction), the colormap range and color, prior knowledge of normal battery characteristics, and the query. (b) Illustration of the colormap used in the thermal image.
  • Figure 2: Samples from the test set proposed in shabayek2025ai. It consists of normal images and three types of anomalies: overheating, reflection, and spatial tape. (a) Normal images show a smooth gradient and a temperature below the threshold. (b) Overheating images exhibit high overall temperatures, even without distinct hot or cold spots. (c) Reflection images display an uneven distribution with hot spots and abnormally high temperatures. (d) Spatial tape cases show cold spots.
  • Figure 3: Output examples for normal and anomalous data from the three VQA models: ChatGPT-4o hurst2024gpt (Prompt 2), LLaVa-13b liu2024improved (Prompt 3), and BLIP-2 li2023blip (Prompt 5). ChatGPT-4o and LLaVa-13b provide explanations in addition to their predictions, while BLIP-2 only generates the final prediction.
  • Figure 4: Samples of incorrect predictions in ChatGPT-4o, with the detected condition from the prior knowledge indicated in its explanation. Red text represents incorrect predictions, while green text represents correct predictions.
  • Figure 5: Samples of correct predictions in ChatGPT-4o, with incorrectly predicted conditions from the prior knowledge indicated in its explanation. Red text represents incorrect predictions, while green text represents correct predictions.
  • ...and 2 more figures