Table of Contents
Fetching ...

ORACL: Optimized Reasoning for Autoscaling via Chain of Thought with LLMs for Microservices

Haoyu Bai, Muhammed Tawfiqul Islam, Minxian Xu, Rajkumar Buyya

TL;DR

ORACL addresses the challenge of resource management in microservices by unifying root-cause analysis and autoscaling through an LLM-based chain-of-thought framework. It converts runtime telemetry into semantic prompts, uses a structured CoT reasoning process, and employs offline supervised fine-tuning plus GSPO to improve robustness and generalization without deployment-specific retraining. Empirical results show improved RCA recall/precision/accuracy and competitive autoscaling QoS, with superior tail latency behavior and stable throughput in dynamic environments. The approach offers a practical, interpretable, and data-efficient pathway to adaptive cloud autoscaling, albeit with acknowledged limits in inference cost and data variety that future work aims to address.

Abstract

Applications are moving away from monolithic designs to microservice and serverless architectures, where fleets of lightweight and independently deployable components run on public clouds. Autoscaling serves as the primary control mechanism for balancing resource utilization and quality of service, yet existing policies are either opaque learned models that require substantial per-deployment training or brittle hand-tuned rules that fail to generalize. We investigate whether large language models can act as universal few-shot resource allocators that adapt across rapidly evolving microservice deployments. We propose ORACL, Optimized Reasoning for Autoscaling via Chain of Thought with LLMs for Microservices, a framework that leverages prior knowledge and chain-of-thought reasoning to diagnose performance regressions and recommend resource allocations. ORACL transforms runtime telemetry, including pods, replicas, CPU and memory usage, latency, service-level objectives, and fault signals, into semantic natural-language state descriptions and invokes an LLM to produce an interpretable intermediate reasoning trace. This reasoning identifies likely root causes, prunes the action space, and issues safe allocation decisions under policy constraints. Experiments on representative open-source microservice workloads show that ORACL improves root-cause identification accuracy by 15 percent, accelerates training by up to 24x, and improves quality of service by 6 percent in short-term scenarios, without deployment-specific retraining.

ORACL: Optimized Reasoning for Autoscaling via Chain of Thought with LLMs for Microservices

TL;DR

ORACL addresses the challenge of resource management in microservices by unifying root-cause analysis and autoscaling through an LLM-based chain-of-thought framework. It converts runtime telemetry into semantic prompts, uses a structured CoT reasoning process, and employs offline supervised fine-tuning plus GSPO to improve robustness and generalization without deployment-specific retraining. Empirical results show improved RCA recall/precision/accuracy and competitive autoscaling QoS, with superior tail latency behavior and stable throughput in dynamic environments. The approach offers a practical, interpretable, and data-efficient pathway to adaptive cloud autoscaling, albeit with acknowledged limits in inference cost and data variety that future work aims to address.

Abstract

Applications are moving away from monolithic designs to microservice and serverless architectures, where fleets of lightweight and independently deployable components run on public clouds. Autoscaling serves as the primary control mechanism for balancing resource utilization and quality of service, yet existing policies are either opaque learned models that require substantial per-deployment training or brittle hand-tuned rules that fail to generalize. We investigate whether large language models can act as universal few-shot resource allocators that adapt across rapidly evolving microservice deployments. We propose ORACL, Optimized Reasoning for Autoscaling via Chain of Thought with LLMs for Microservices, a framework that leverages prior knowledge and chain-of-thought reasoning to diagnose performance regressions and recommend resource allocations. ORACL transforms runtime telemetry, including pods, replicas, CPU and memory usage, latency, service-level objectives, and fault signals, into semantic natural-language state descriptions and invokes an LLM to produce an interpretable intermediate reasoning trace. This reasoning identifies likely root causes, prunes the action space, and issues safe allocation decisions under policy constraints. Experiments on representative open-source microservice workloads show that ORACL improves root-cause identification accuracy by 15 percent, accelerates training by up to 24x, and improves quality of service by 6 percent in short-term scenarios, without deployment-specific retraining.
Paper Structure (43 sections, 18 equations, 9 figures, 3 tables, 3 algorithms)

This paper contains 43 sections, 18 equations, 9 figures, 3 tables, 3 algorithms.

Figures (9)

  • Figure 1: Convergence of the DRPC Algorithm
  • Figure 2: ORACL System Architecture
  • Figure 3: ORACL System Prototype Diagram
  • Figure 4: An Expected Output From LLM
  • Figure 5: System-wide operation with a Dual Objective Procedure
  • ...and 4 more figures