Hestia: Hyperthread-Level Scheduling for Cloud Microservices with Interference-Aware Attention

Dingyu Yang; Fanyong Kong; Jie Dai; Shiyou Qian; Shuangwei Li; Jian Cao; Guangtao Xue; Gang Chen

Hestia: Hyperthread-Level Scheduling for Cloud Microservices with Interference-Aware Attention

Dingyu Yang, Fanyong Kong, Jie Dai, Shiyou Qian, Shuangwei Li, Jian Cao, Guangtao Xue, Gang Chen

TL;DR

This work presents Hestia, a hyperthread-level, interference-aware scheduling framework powered by self-attention, which reduces the 95th-percentile service latency by up to 80%, lowers overall CPU consumption, and surpasses five state-of-the-art schedulers by up to 30.65\% across diverse contention scenarios.

Abstract

Modern cloud servers routinely co-locate multiple latency-sensitive microservice instances to improve resource efficiency. However, the diversity of microservice behaviors, coupled with mutual performance interference under simultaneous multithreading (SMT), makes large-scale placement increasingly complex. Existing interference aware schedulers and isolation techniques rely on coarse core-level profiling or static resource partitioning, leaving asymmetric hyperthread-level heterogeneity and SMT contention dynamics largely unmodeled. We present Hestia, a hyperthread-level, interference-aware scheduling framework powered by self-attention. Through an extensive analysis of production traces encompassing 32,408 instances across 3,132 servers, we identify two dominant contention patterns -- sharing-core (SC) and sharing-socket (SS) -- and reveal strong asymmetry in their impact. Guided by these insights, Hestia incorporates (1) a self-attention-based CPU usage predictor that models SC/SS contention and hardware heterogeneity, and (2) an interference scoring model that estimates pairwise contention risks to guide scheduling decisions. We evaluate Hestia through large-scale simulation and a real production deployment. Hestia reduces the 95th-percentile service latency by up to 80\%, lowers overall CPU consumption by 2.3\% under the same workload, and surpasses five state-of-the-art schedulers by up to 30.65\% across diverse contention scenarios.

Hestia: Hyperthread-Level Scheduling for Cloud Microservices with Interference-Aware Attention

TL;DR

Abstract

Paper Structure (17 sections, 1 equation, 10 figures, 2 tables)

This paper contains 17 sections, 1 equation, 10 figures, 2 tables.

Introduction
Background and Motivation
Microservices and Server Architecture
Observation and Motivation
Observations
Motivation
DESIGN OF Hestia
Overview
Topology-Aware Selector
Attention-guided Prediction Model
Interference Scoring Mechanism
EVALUATION
Experimental setup
Overall performance evaluation
Evaluation in a Production Cluster
...and 2 more sections

Figures (10)

Figure 1: Dual-socket server with co-located LS instances.
Figure 2: Spearman analysis and CPU quota coverage.
Figure 3: Correlation between CPU util. and latency.
Figure 4: Interference between $Ordering$ instances and their SC/SS neighbors.
Figure 5: The overview of Hestia.
...and 5 more figures

Hestia: Hyperthread-Level Scheduling for Cloud Microservices with Interference-Aware Attention

TL;DR

Abstract

Hestia: Hyperthread-Level Scheduling for Cloud Microservices with Interference-Aware Attention

Authors

TL;DR

Abstract

Table of Contents

Figures (10)