Table of Contents
Fetching ...

RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems

Biao Ouyang, Yingying Zhang, Hanyin Cheng, Yang Shu, Chenjuan Guo, Bin Yang, Qingsong Wen, Lunting Fan, Christian S. Jensen

TL;DR

RCRank addresses slow-query diagnosis in cloud database systems by ranking the impact of internal root causes using multimodal evidence from SQL statements, execution plans, execution logs, and KPIs. It introduces a self-supervised, cross-modal Transformer-based architecture that adaptively fuses modalities and a unified objective that jointly identifies root causes and estimates their impacts as $\hat{y}_{ij}$ for each root cause $RC_j$ on query $X_i$. Experiments on real Hologres data and synthetic benchmarks show consistent improvements over state-of-the-art RC identification and ranking baselines, with substantial end-to-end run-time reductions for revised queries. The results demonstrate that integrating four modalities and impact-aware training yields more effective and cost-efficient slow-query revisions in practice.

Abstract

With the continued migration of storage to cloud database systems,the impact of slow queries in such systems on services and user experience is increasing. Root-cause diagnosis plays an indispensable role in facilitating slow-query detection and revision. This paper proposes a method capable of both identifying possible root cause types for slow queries and ranking these according to their potential for accelerating slow queries. This enables prioritizing root causes with the highest impact, in turn improving slow-query revision effectiveness. To enable more accurate and detailed diagnoses, we propose the multimodal Ranking for the Root Causes of slow queries (RCRank) framework, which formulates root cause analysis as a multimodal machine learning problem and leverages multimodal information from query statements, execution plans, execution logs, and key performance indicators. To obtain expressive embeddings from its heterogeneous multimodal input, RCRank integrates self-supervised pre-training that enhances cross-modal alignment and task relevance. Next, the framework integrates root-cause-adaptive cross Transformers that enable adaptive fusion of multimodal features with varying characteristics. Finally, the framework offers a unified model that features an impact-aware training objective for identifying and ranking root causes. We report on experiments on real and synthetic datasets, finding that RCRank is capable of consistently outperforming the state-of-the-art methods at root cause identification and ranking according to a range of metrics.

RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems

TL;DR

RCRank addresses slow-query diagnosis in cloud database systems by ranking the impact of internal root causes using multimodal evidence from SQL statements, execution plans, execution logs, and KPIs. It introduces a self-supervised, cross-modal Transformer-based architecture that adaptively fuses modalities and a unified objective that jointly identifies root causes and estimates their impacts as for each root cause on query . Experiments on real Hologres data and synthetic benchmarks show consistent improvements over state-of-the-art RC identification and ranking baselines, with substantial end-to-end run-time reductions for revised queries. The results demonstrate that integrating four modalities and impact-aware training yields more effective and cost-efficient slow-query revisions in practice.

Abstract

With the continued migration of storage to cloud database systems,the impact of slow queries in such systems on services and user experience is increasing. Root-cause diagnosis plays an indispensable role in facilitating slow-query detection and revision. This paper proposes a method capable of both identifying possible root cause types for slow queries and ranking these according to their potential for accelerating slow queries. This enables prioritizing root causes with the highest impact, in turn improving slow-query revision effectiveness. To enable more accurate and detailed diagnoses, we propose the multimodal Ranking for the Root Causes of slow queries (RCRank) framework, which formulates root cause analysis as a multimodal machine learning problem and leverages multimodal information from query statements, execution plans, execution logs, and key performance indicators. To obtain expressive embeddings from its heterogeneous multimodal input, RCRank integrates self-supervised pre-training that enhances cross-modal alignment and task relevance. Next, the framework integrates root-cause-adaptive cross Transformers that enable adaptive fusion of multimodal features with varying characteristics. Finally, the framework offers a unified model that features an impact-aware training objective for identifying and ranking root causes. We report on experiments on real and synthetic datasets, finding that RCRank is capable of consistently outperforming the state-of-the-art methods at root cause identification and ranking according to a range of metrics.

Paper Structure

This paper contains 30 sections, 17 equations, 8 figures, 9 tables, 1 algorithm.

Figures (8)

  • Figure 1: Root cause identification (RCI) vs. root cause ranking (RCR). (i) RCI often utilizes partial observability, whereas RCR utilizes multimodal, full observability. (ii) RCI only identifies possible root causes, whereas RCR ranks root causes according to their potential impact, enabling users to identify the most significant root causes.
  • Figure 2: Overview of the multimodal diagnosis framework for root causes of slow queries.
  • Figure 3: Prompt template with context and revision sections for LLM-based slow-query revision.
  • Figure 4: Overview of the multimodal learning model for root cause diagnosis, which is composed of three main modules: (1) input embedding module, (2) multimodal fusion module, and (3) root cause estimation module.
  • Figure 5: Illustration of self-supervised pre-training for multimodal encoders $Enc_S$, $Enc_Q$, $Enc_L$, and $Enc_I$.
  • ...and 3 more figures