Table of Contents
Fetching ...

KORAL: Knowledge Graph Guided LLM Reasoning for SSD Operational Analysis

Mayur Akewar, Sandeep Madireddy, Dongsheng Luo, Janki Bhimani

TL;DR

KORAL presents a knowledge-driven framework that combines Large Language Models (LLMs) with two Knowledge Graphs (KGs) to enable end-to-end SSD operational analysis from fragmented telemetry. Stage I constructs a Literature KG aligned to an SSD taxonomy, with provenance and controlled vocabulary growth; Stage II materializes a Data KG from production telemetry and grounds analysis by retrieving literature context, delivering descriptive, predictive, prescriptive, and what-if outputs with transparent justification. Empirical evaluation on Google and Alibaba datasets shows expert-level diagnostics and actionable recommendations, with improved reasoning transparency and reduced manual effort. The approach achieves grounded, multi-modal analysis at device and fleet scales, and the authors release the SSD-specific KG to support reproducible research and broader adoption in knowledge-based storage system analysis.

Abstract

Solid State Drives (SSDs) are critical to datacenters, consumer platforms, and mission-critical systems. Yet diagnosing their performance and reliability is difficult because data are fragmented and time-disjoint, and existing methods demand large datasets and expert input while offering only limited insights. Degradation arises not only from shifting workloads and evolving architectures but also from environmental factors such as temperature, humidity, and vibration. We present KORAL, a knowledge driven reasoning framework that integrates Large Language Models (LLMs) with a structured Knowledge Graph (KG) to generate insights into SSD operations. Unlike traditional approaches that require extensive expert input and large datasets, KORAL generates a Data KG from fragmented telemetry and integrates a Literature KG that already organizes knowledge from literature, reports, and traces. This turns unstructured sources into a queryable graph and telemetry into structured knowledge, and both the Graphs guide the LLM to deliver evidence-based, explainable analysis aligned with the domain vocabulary and constraints. Evaluation using real production traces shows that the KORAL delivers expert-level diagnosis and recommendations, supported by grounded explanations that improve reasoning transparency, guide operator decisions, reduce manual effort, and provide actionable insights to improve service quality. To our knowledge, this is the first end-to-end system that combines LLMs and KGs for full-spectrum SSD reasoning including Descriptive, Predictive, Prescriptive, and What-if analysis. We release the generated SSD-specific KG to advance reproducible research in knowledge-based storage system analysis. GitHub Repository: https://github.com/Damrl-lab/KORAL

KORAL: Knowledge Graph Guided LLM Reasoning for SSD Operational Analysis

TL;DR

KORAL presents a knowledge-driven framework that combines Large Language Models (LLMs) with two Knowledge Graphs (KGs) to enable end-to-end SSD operational analysis from fragmented telemetry. Stage I constructs a Literature KG aligned to an SSD taxonomy, with provenance and controlled vocabulary growth; Stage II materializes a Data KG from production telemetry and grounds analysis by retrieving literature context, delivering descriptive, predictive, prescriptive, and what-if outputs with transparent justification. Empirical evaluation on Google and Alibaba datasets shows expert-level diagnostics and actionable recommendations, with improved reasoning transparency and reduced manual effort. The approach achieves grounded, multi-modal analysis at device and fleet scales, and the authors release the SSD-specific KG to support reproducible research and broader adoption in knowledge-based storage system analysis.

Abstract

Solid State Drives (SSDs) are critical to datacenters, consumer platforms, and mission-critical systems. Yet diagnosing their performance and reliability is difficult because data are fragmented and time-disjoint, and existing methods demand large datasets and expert input while offering only limited insights. Degradation arises not only from shifting workloads and evolving architectures but also from environmental factors such as temperature, humidity, and vibration. We present KORAL, a knowledge driven reasoning framework that integrates Large Language Models (LLMs) with a structured Knowledge Graph (KG) to generate insights into SSD operations. Unlike traditional approaches that require extensive expert input and large datasets, KORAL generates a Data KG from fragmented telemetry and integrates a Literature KG that already organizes knowledge from literature, reports, and traces. This turns unstructured sources into a queryable graph and telemetry into structured knowledge, and both the Graphs guide the LLM to deliver evidence-based, explainable analysis aligned with the domain vocabulary and constraints. Evaluation using real production traces shows that the KORAL delivers expert-level diagnosis and recommendations, supported by grounded explanations that improve reasoning transparency, guide operator decisions, reduce manual effort, and provide actionable insights to improve service quality. To our knowledge, this is the first end-to-end system that combines LLMs and KGs for full-spectrum SSD reasoning including Descriptive, Predictive, Prescriptive, and What-if analysis. We release the generated SSD-specific KG to advance reproducible research in knowledge-based storage system analysis. GitHub Repository: https://github.com/Damrl-lab/KORAL
Paper Structure (44 sections, 2 equations, 8 figures, 2 tables)

This paper contains 44 sections, 2 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Timeline of SSD signals across days by telemetry type. Cross-hatched bars denote dynamic, time-varying SMART telemetry (e.g., workload and external factors), while dotted bars denote static descriptors (e.g., flash type, firmware).
  • Figure 2: A slice of the SSD taxonomy used as shared vocabulary. The taxonomy is seeded by experts; KORAL may propose new concepts from literature, which are added after validation.
  • Figure 3: Automatically generated Literature KG fragment from Stage I that encodes the claim that "temperature and humidity impact write I/O more than read I/O". Nodes and relations are produced by the automated extraction and ontology alignment pipeline, with provenance linked to the supporting sentence.
  • Figure 4: KORAL Stage I Flow (RQ1). Document parsing, claim extraction, entity linking, and Literature KG construction are automated. When an out of vocabulary concept is detected, the system generates a concept proposal that is manually reviewed and validated before it is added to the vocabulary.
  • Figure 5: KORAL Stage II Flow (RQ2, RQ3 & RQ4). The only expert authored component is the curated rule repository that defines how telemetry is summarized into frames and mapped to graph relations. Given this rule base, intermediate representation construction, Data KG creation, node linking, summarization, retrieval, and generation are automated.
  • ...and 3 more figures