MicroRacer: Detecting Concurrency Bugs for Cloud Service Systems
Zhiling Deng, Juepeng Wang, Zhuangbin Chen
TL;DR
MicroRacer addresses the challenge of detecting concurrency bugs in cloud-native microservice systems by using non-intrusive runtime instrumentation to collect end-to-end traces, then identifying conflicting request pairs and validating them via automated interleaving tests. It constructs end-to-end request flows by instrumenting libraries to capture data and request spans, maps requests to shared state, and prunes the search space with flow causality before rigorous testing. Experimental results on four popular benchmarks with replicated industrial bugs show high bug-detection accuracy and substantial reduction in candidate pairs, with practical mechanisms to reduce false positives. The work demonstrates a scalable, automated approach to improving reliability in multi-datastore, asynchronous microservice environments.
Abstract
Modern cloud applications delivering global services are often built on distributed systems with a microservice architecture. In such systems, end-to-end user requests traverse multiple different services and machines, exhibiting intricate interactions. Consequently, cloud service systems are vulnerable to concurrency bugs, which pose significant challenges to their reliability. Existing methods for concurrency bug detection often fall short due to their intrusive nature and inability to handle the architectural complexities of microservices. To address these limitations, we propose MicroRacer, a non-intrusive and automated framework for detecting concurrency bugs in such environments. By dynamically instrumenting widely-used libraries at runtime, MicroRacer collects detailed trace data without modifying the application code. Such data are utilized to analyze the happened-before relationship and resource access patterns of common operations within service systems. Based on this information, MicroRacer identifies suspicious concurrent operations and employs a three-stage validation process to test and confirm concurrency bugs. Experiments on open-source microservice benchmarks with replicated industrial bugs demonstrate MicroRacer's effectiveness and efficiency in accurately detecting and pinpointing concurrency issues.
