Table of Contents
Fetching ...

R-Log: Incentivizing Log Analysis Capability in LLMs via Reasoning-based Reinforcement Learning

Yilun Liu, Ziang Chen, Song Xu, Minggui He, Shimin Tao, Weibin Meng, Yuming Xie, Tao Han, Chunguang Zhao, Jingzhou Du, Daimeng Wei, Shenglin Zhang, Yongqian Sun

TL;DR

R-Log introduces a reasoning-first learning paradigm for log analysis by pairing human-aligned reasoning trajectories with reinforcement learning in a simulated O&M environment. By shifting from X→Y to X→(R,Y) and training with two stages (cold-start SFT on X,R,Y followed by RL with a joint reward), it achieves state-of-the-art results across five log-analysis sub-tasks and demonstrates strong generalization to unseen tasks. The method leverages 13 reasoning templates, a Log Reasoning Dataset of 2k+ trajectories, and a GRPO-based reward mechanism to align model behavior with expert practices. The practical impact includes improved reliability and interpretability in log analysis, demonstrated by deployment in Huawei’s auto-troubleshooting systems and the availability of a fast variant, R-Log-fast, suitable for latency-constrained environments.

Abstract

The growing complexity of log data in modern software systems has prompted the use of Large Language Models (LLMs) for automated log analysis. Current approaches typically rely on direct supervised fine-tuning (SFT) on log-label pairs. However, this exacerbates the domain discrepancy between general-purpose LLMs and specialized log data, causing overfitting. Furthermore, SFT's imbalanced loss computation often allows lengthy contexts to overwhelm critical, concise details in model answers, leading to hallucinations. To address these limitations, we propose R-Log, a novel reasoning-based paradigm that mirrors the structured, step-by-step analytical process of human engineers. This approach enhances generalizability by learning the underlying rules behind conclusions. We further employ Reinforcement Learning (RL) to optimize the model within a simulated O&M environment, thereby reducing hallucinations by directly rewarding correct outcomes. R-Log is first cold-started on a curated dataset of 2k+ reasoning trajectories, guided by 13 strategies from manual O&M practices, to establish an initial reasoning capability. This ability is then refined via RL using a joint reward function. Empirical evaluations on real-world logs show that R-Log outperforms existing methods across five log analysis tasks, particularly in unseen scenarios (by 228.05%). We also designed R-Log-fast with 5x speedup while keeping 93% of the efficacy.

R-Log: Incentivizing Log Analysis Capability in LLMs via Reasoning-based Reinforcement Learning

TL;DR

R-Log introduces a reasoning-first learning paradigm for log analysis by pairing human-aligned reasoning trajectories with reinforcement learning in a simulated O&M environment. By shifting from X→Y to X→(R,Y) and training with two stages (cold-start SFT on X,R,Y followed by RL with a joint reward), it achieves state-of-the-art results across five log-analysis sub-tasks and demonstrates strong generalization to unseen tasks. The method leverages 13 reasoning templates, a Log Reasoning Dataset of 2k+ trajectories, and a GRPO-based reward mechanism to align model behavior with expert practices. The practical impact includes improved reliability and interpretability in log analysis, demonstrated by deployment in Huawei’s auto-troubleshooting systems and the availability of a fast variant, R-Log-fast, suitable for latency-constrained environments.

Abstract

The growing complexity of log data in modern software systems has prompted the use of Large Language Models (LLMs) for automated log analysis. Current approaches typically rely on direct supervised fine-tuning (SFT) on log-label pairs. However, this exacerbates the domain discrepancy between general-purpose LLMs and specialized log data, causing overfitting. Furthermore, SFT's imbalanced loss computation often allows lengthy contexts to overwhelm critical, concise details in model answers, leading to hallucinations. To address these limitations, we propose R-Log, a novel reasoning-based paradigm that mirrors the structured, step-by-step analytical process of human engineers. This approach enhances generalizability by learning the underlying rules behind conclusions. We further employ Reinforcement Learning (RL) to optimize the model within a simulated O&M environment, thereby reducing hallucinations by directly rewarding correct outcomes. R-Log is first cold-started on a curated dataset of 2k+ reasoning trajectories, guided by 13 strategies from manual O&M practices, to establish an initial reasoning capability. This ability is then refined via RL using a joint reward function. Empirical evaluations on real-world logs show that R-Log outperforms existing methods across five log analysis tasks, particularly in unseen scenarios (by 228.05%). We also designed R-Log-fast with 5x speedup while keeping 93% of the efficacy.

Paper Structure

This paper contains 36 sections, 2 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: An real case from evaluation illustrating the reasoning-based nature of R-Log. Through human-like step-by-step reasoning, R-Log avoided the hallucinated “a minute” by existing LLMs and successfully identified the short response time “1ms” from the error log.
  • Figure 2: Illustration on the construction of the human-aligned Log Reasoning Dataset and two-staged training of R-Log. Highlighted parts reflects natures of “conceptional” or “procedural” in thinking strategies, and the real-world logs in instantiation.
  • Figure 3: Ablation study on (a) training stages and (b) human reasoning templates. Scores are averaged across domains.
  • Figure 4: “Think-before-answer” (R-Log) v.s. “Answer-before-think” (R-Log-fast), for a trade-off between efficacy and efficiency. The radar displays the relative percentage of baselines' average performances in comparison to R-Log's.
  • Figure 5: R-Log is deployed in an auto-troubleshooting application emphasizing interpretability in Huawei. The generated cards visualize reasoning trajectories of R-Log handling logged errors. The user data presented is synthesized for demo purpose.