Table of Contents
Fetching ...

MicroRacer: Detecting Concurrency Bugs for Cloud Service Systems

Zhiling Deng, Juepeng Wang, Zhuangbin Chen

TL;DR

MicroRacer addresses the challenge of detecting concurrency bugs in cloud-native microservice systems by using non-intrusive runtime instrumentation to collect end-to-end traces, then identifying conflicting request pairs and validating them via automated interleaving tests. It constructs end-to-end request flows by instrumenting libraries to capture data and request spans, maps requests to shared state, and prunes the search space with flow causality before rigorous testing. Experimental results on four popular benchmarks with replicated industrial bugs show high bug-detection accuracy and substantial reduction in candidate pairs, with practical mechanisms to reduce false positives. The work demonstrates a scalable, automated approach to improving reliability in multi-datastore, asynchronous microservice environments.

Abstract

Modern cloud applications delivering global services are often built on distributed systems with a microservice architecture. In such systems, end-to-end user requests traverse multiple different services and machines, exhibiting intricate interactions. Consequently, cloud service systems are vulnerable to concurrency bugs, which pose significant challenges to their reliability. Existing methods for concurrency bug detection often fall short due to their intrusive nature and inability to handle the architectural complexities of microservices. To address these limitations, we propose MicroRacer, a non-intrusive and automated framework for detecting concurrency bugs in such environments. By dynamically instrumenting widely-used libraries at runtime, MicroRacer collects detailed trace data without modifying the application code. Such data are utilized to analyze the happened-before relationship and resource access patterns of common operations within service systems. Based on this information, MicroRacer identifies suspicious concurrent operations and employs a three-stage validation process to test and confirm concurrency bugs. Experiments on open-source microservice benchmarks with replicated industrial bugs demonstrate MicroRacer's effectiveness and efficiency in accurately detecting and pinpointing concurrency issues.

MicroRacer: Detecting Concurrency Bugs for Cloud Service Systems

TL;DR

MicroRacer addresses the challenge of detecting concurrency bugs in cloud-native microservice systems by using non-intrusive runtime instrumentation to collect end-to-end traces, then identifying conflicting request pairs and validating them via automated interleaving tests. It constructs end-to-end request flows by instrumenting libraries to capture data and request spans, maps requests to shared state, and prunes the search space with flow causality before rigorous testing. Experimental results on four popular benchmarks with replicated industrial bugs show high bug-detection accuracy and substantial reduction in candidate pairs, with practical mechanisms to reduce false positives. The work demonstrates a scalable, automated approach to improving reliability in multi-datastore, asynchronous microservice environments.

Abstract

Modern cloud applications delivering global services are often built on distributed systems with a microservice architecture. In such systems, end-to-end user requests traverse multiple different services and machines, exhibiting intricate interactions. Consequently, cloud service systems are vulnerable to concurrency bugs, which pose significant challenges to their reliability. Existing methods for concurrency bug detection often fall short due to their intrusive nature and inability to handle the architectural complexities of microservices. To address these limitations, we propose MicroRacer, a non-intrusive and automated framework for detecting concurrency bugs in such environments. By dynamically instrumenting widely-used libraries at runtime, MicroRacer collects detailed trace data without modifying the application code. Such data are utilized to analyze the happened-before relationship and resource access patterns of common operations within service systems. Based on this information, MicroRacer identifies suspicious concurrent operations and employs a three-stage validation process to test and confirm concurrency bugs. Experiments on open-source microservice benchmarks with replicated industrial bugs demonstrate MicroRacer's effectiveness and efficiency in accurately detecting and pinpointing concurrency issues.

Paper Structure

This paper contains 32 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Service call graph of the illustrating example
  • Figure 2: The architecture of MicroRacer
  • Figure 3: Request flows of the illustrating example. The flow A (1→2) and the flow B (3→4→5→6→7) have race conditions at data spans b and d.
  • Figure 4: Request flows when a service asynchronously invokes other services. The left diagram shows the actual request flow in the system, while the right diagram shows the request flow split by MicroRacer during analysis.
  • Figure 5: SQL statements executed in TrainTicket during runtime.
  • ...and 1 more figures