An Empirical Study of Cross-Language Interoperability in Replicated Data Systems
Provakar Mondal, Eli Tilevich
TL;DR
The paper addresses the challenge of cross-language interoperability in replicated data systems by empirically comparing two integration strategies: FFIs with monolingual RDLs versus a multilingual RDL coordinating replicas via a common data format. It introduces Hermes, a CDF-based RDL implemented in Go, JavaScript, and Java, with plug-in extensibility to add features in a single language while preserving multilingual integration. Results show that CDF-based integration yields better software quality and performance (lower latency, memory usage, and higher throughput), and Hermes demonstrates effective cross-language coordination and extensibility through protobuf-based updates and plug-ins. The work provides practical guidance for designing RDLs in multilingual distributed environments and establishes a concrete, extensible reference implementation for future research and development.
Abstract
BACKGROUND: Modern distributed systems replicate data across multiple execution sites. Business requirements and resource constraints often necessitate mixing different languages across replica sites. To facilitate the management of replicated data, modern software engineering practices integrate special-purpose replicated data libraries (RDLs) that provide read-write access to the data and ensure its synchronization. Irrespective of the implementation languages, an RDL typically uses a single language or offers bindings to a designated one. Hence, integrating existing RDLs in multilingual environments requires special-purpose code, whose software quality and performance characteristics are poorly understood. AIMS: We aim to bridge this knowledge gap to understand the software quality and performance characteristics of RDL integration in multilingual environments. METHOD: We conduct an empirical study of two key strategies for integrating RDLs in the context of multilingual replicated data systems: foreign-function interface (FFI) and a common data format (CDF); we measure and compare their respective software metrics and performance to understand their suitability for the task at hand. RESULTS: Our results reveal that adopting CDF for cross-language interaction offers software quality, latency, memory consumption, and throughput advantages. We further validate our findings by (1) creating a CDF-based RDL for mixing compiled, interpreted, and managed languages; and (2) enhancing our RDL with plug-in extensibility that enables adding functionality in a single language while maintaining integration within a multilingual environment. CONCLUSIONS: With modern distributed systems utilizing multiple languages, our findings provide novel insights for designing RDLs in multilingual replicated data systems.
