Table of Contents
Fetching ...

Causal Inference for Quantifying Noisy Neighbor Effects in Multi-Tenant Cloud Environments

Philipe S. Schiavo, João P. S. Milanezi, Moisés R. N. Ribeiro, Víctor M. G. Martínez, João Henrique Corrêa, José Marcos Nogueira, Fernando Frota Redigolo, Tereza C. Carvalho, Flávio de Oliveira Silva

Abstract

Resource sharing in multi-tenant cloud environments enables cost efficiency but introduces the Noisy Neighbor problem, i.e., co-located workloads that unpredictably degrade each other's performance. Despite extensive research on detecting such effects, there are no explainable methodologies for quantifying the severity of impact and establishing causal relationships among tenants. We propose an analytical that combines controlled experimentation with multi-stage causal inference and validates it across 10 independent rounds in a Kubernetes testbed. Our methodology not only quantifies severe performance degradations (e.g., up to 67\% in I/O-bound workloads under combined stress) but also statistically establishes causality through Granger causality analysis, revealing a 75\% increase in causal links when the noisy neighbor activates. Furthermore, we identify unique "degradation signatures" for each resource contention vector (i.e., CPU, memory, disk, network), enabling diagnostic capabilities that go beyond anomaly detection. This work transforms the Noisy Neighbor from an elusive problem into a quantifiable, diagnosable phenomenon, providing cloud operators with actionable insights for SLA management and smart resource allocation.

Causal Inference for Quantifying Noisy Neighbor Effects in Multi-Tenant Cloud Environments

Abstract

Resource sharing in multi-tenant cloud environments enables cost efficiency but introduces the Noisy Neighbor problem, i.e., co-located workloads that unpredictably degrade each other's performance. Despite extensive research on detecting such effects, there are no explainable methodologies for quantifying the severity of impact and establishing causal relationships among tenants. We propose an analytical that combines controlled experimentation with multi-stage causal inference and validates it across 10 independent rounds in a Kubernetes testbed. Our methodology not only quantifies severe performance degradations (e.g., up to 67\% in I/O-bound workloads under combined stress) but also statistically establishes causality through Granger causality analysis, revealing a 75\% increase in causal links when the noisy neighbor activates. Furthermore, we identify unique "degradation signatures" for each resource contention vector (i.e., CPU, memory, disk, network), enabling diagnostic capabilities that go beyond anomaly detection. This work transforms the Noisy Neighbor from an elusive problem into a quantifiable, diagnosable phenomenon, providing cloud operators with actionable insights for SLA management and smart resource allocation.

Paper Structure

This paper contains 16 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Impact heatmaps showing mean percentage change for victim tenants across experimental phases and metrics. Red indicates degradation; blue indicates increase.
  • Figure 2: Causal link density per phase. The noisy neighbor's activation dramatically increases causal connections, proving directional influence.
  • Figure 3: Consolidated ECDFs showing unique distributional shifts for each noise type. Different contention vectors produce distinct "degradation signatures."
  • Figure 4: Tenant Coupling Index across phases and metrics. The Noisy Neighbor (tenant-nsy) consistently exhibits the highest coupling values, confirming its role as the dominant interference source.