Table of Contents
Fetching ...

Large Language Models for Anomaly Detection in Computational Workflows: from Supervised Fine-Tuning to In-Context Learning

Hongwei Jin, George Papadimitriou, Krishnan Raghavan, Pawel Zuk, Prasanna Balaprakash, Cong Wang, Anirban Mandal, Ewa Deelman

TL;DR

This work tackles anomaly detection in computational workflows where traditional rule-based and statistical methods struggle with novel patterns. It evaluates two LLM-based paradigms—supervised fine-tuning (SFT) and in-context learning (ICL)—for converting workflow logs into anomaly labels, with Flow-Bench as the testbed. The study analyzes bias mitigation, catastrophic forgetting, transfer learning, and interpretability via chain-of-thought prompting, and demonstrates real-time online detection capabilities. The results indicate that SFT yields strong, generalizable performance while ICL offers flexible, data-efficient detection and useful explanations, underscoring the practical potential of LLMs for reliable HPC workflow management.

Abstract

Anomaly detection in computational workflows is critical for ensuring system reliability and security. However, traditional rule-based methods struggle to detect novel anomalies. This paper leverages large language models (LLMs) for workflow anomaly detection by exploiting their ability to learn complex data patterns. Two approaches are investigated: 1) supervised fine-tuning (SFT), where pre-trained LLMs are fine-tuned on labeled data for sentence classification to identify anomalies, and 2) in-context learning (ICL) where prompts containing task descriptions and examples guide LLMs in few-shot anomaly detection without fine-tuning. The paper evaluates the performance, efficiency, generalization of SFT models, and explores zero-shot and few-shot ICL prompts and interpretability enhancement via chain-of-thought prompting. Experiments across multiple workflow datasets demonstrate the promising potential of LLMs for effective anomaly detection in complex executions.

Large Language Models for Anomaly Detection in Computational Workflows: from Supervised Fine-Tuning to In-Context Learning

TL;DR

This work tackles anomaly detection in computational workflows where traditional rule-based and statistical methods struggle with novel patterns. It evaluates two LLM-based paradigms—supervised fine-tuning (SFT) and in-context learning (ICL)—for converting workflow logs into anomaly labels, with Flow-Bench as the testbed. The study analyzes bias mitigation, catastrophic forgetting, transfer learning, and interpretability via chain-of-thought prompting, and demonstrates real-time online detection capabilities. The results indicate that SFT yields strong, generalizable performance while ICL offers flexible, data-efficient detection and useful explanations, underscoring the practical potential of LLMs for reliable HPC workflow management.

Abstract

Anomaly detection in computational workflows is critical for ensuring system reliability and security. However, traditional rule-based methods struggle to detect novel anomalies. This paper leverages large language models (LLMs) for workflow anomaly detection by exploiting their ability to learn complex data patterns. Two approaches are investigated: 1) supervised fine-tuning (SFT), where pre-trained LLMs are fine-tuned on labeled data for sentence classification to identify anomalies, and 2) in-context learning (ICL) where prompts containing task descriptions and examples guide LLMs in few-shot anomaly detection without fine-tuning. The paper evaluates the performance, efficiency, generalization of SFT models, and explores zero-shot and few-shot ICL prompts and interpretability enhancement via chain-of-thought prompting. Experiments across multiple workflow datasets demonstrate the promising potential of LLMs for effective anomaly detection in complex executions.
Paper Structure (19 sections, 14 figures, 4 tables)

This paper contains 19 sections, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Supervised Fine-tuning and In-Context Learning for anomaly detection
  • Figure 2: Template of parsed log into a sentence.
  • Figure 3: Template of in-context learning.
  • Figure 4: Reported accuracy from pre-trained models and SFT models on 1000 Genome dataset.
  • Figure 5: Training time vs. number of parameters.
  • ...and 9 more figures