Table of Contents
Fetching ...

CrashEventLLM: Predicting System Crashes with Large Language Models

Priyanka Mudgal, Bijan Arbab, Swaathi Sampath Kumar

TL;DR

This work provides the preliminary insights into prompt-based large language models for the log-based event prediction task, built upon a large language model framework that utilizes historical data to forecast future crash events, informed by expert annotations.

Abstract

As the dependence on computer systems expands across various domains, focusing on personal, industrial, and large-scale applications, there arises a compelling need to enhance their reliability to sustain business operations seamlessly and ensure optimal user satisfaction. System logs generated by these devices serve as valuable repositories of historical trends and past failures. The use of machine learning techniques for failure prediction has become commonplace, enabling the extraction of insights from past data to anticipate future behavior patterns. Recently, large language models have demonstrated remarkable capabilities in tasks including summarization, reasoning, and event prediction. Therefore, in this paper, we endeavor to investigate the potential of large language models in predicting system failures, leveraging insights learned from past failure behavior to inform reasoning and decision-making processes effectively. Our approach involves leveraging data from the Intel Computing Improvement Program (ICIP) system crash logs to identify significant events and develop CrashEventLLM. This model, built upon a large language model framework, serves as our foundation for crash event prediction. Specifically, our model utilizes historical data to forecast future crash events, informed by expert annotations. Additionally, it goes beyond mere prediction, offering insights into potential causes for each crash event. This work provides the preliminary insights into prompt-based large language models for the log-based event prediction task.

CrashEventLLM: Predicting System Crashes with Large Language Models

TL;DR

This work provides the preliminary insights into prompt-based large language models for the log-based event prediction task, built upon a large language model framework that utilizes historical data to forecast future crash events, informed by expert annotations.

Abstract

As the dependence on computer systems expands across various domains, focusing on personal, industrial, and large-scale applications, there arises a compelling need to enhance their reliability to sustain business operations seamlessly and ensure optimal user satisfaction. System logs generated by these devices serve as valuable repositories of historical trends and past failures. The use of machine learning techniques for failure prediction has become commonplace, enabling the extraction of insights from past data to anticipate future behavior patterns. Recently, large language models have demonstrated remarkable capabilities in tasks including summarization, reasoning, and event prediction. Therefore, in this paper, we endeavor to investigate the potential of large language models in predicting system failures, leveraging insights learned from past failure behavior to inform reasoning and decision-making processes effectively. Our approach involves leveraging data from the Intel Computing Improvement Program (ICIP) system crash logs to identify significant events and develop CrashEventLLM. This model, built upon a large language model framework, serves as our foundation for crash event prediction. Specifically, our model utilizes historical data to forecast future crash events, informed by expert annotations. Additionally, it goes beyond mere prediction, offering insights into potential causes for each crash event. This work provides the preliminary insights into prompt-based large language models for the log-based event prediction task.
Paper Structure (13 sections, 2 figures)

This paper contains 13 sections, 2 figures.

Figures (2)

  • Figure 1: Our framework utilizes advanced large language models to forecast forthcoming system crashes and their causes through a two-stage process. In the initial stage, an event sequence generation model sifts through raw event logs, identifying system crashes while capturing essential details such as timestamp, bug check code, and parameters. Following this, custom prompts are formulated to refine another large language model. This model undergoes extensive fine-tuning, scrutinizing historical events to detect patterns. Subsequently, it leverages this acquired insight to predict both the occurrence of the next crash event and its underlying cause.
  • Figure 2: Prediction performance of Llama2-7B and Llama2-13B. The left figure shows the ROUGE scores for crash time prediction, the middle figure for crash cause prediction, and the right figure for full prediction. While both LLMs perform comparably in time and full prediction, surprisingly, Llama2-7B demonstrates superior performance in crash cause prediction.