Table of Contents
Fetching ...

An Empirical Study of Cross-Language Interoperability in Replicated Data Systems

Provakar Mondal, Eli Tilevich

TL;DR

The paper addresses the challenge of cross-language interoperability in replicated data systems by empirically comparing two integration strategies: FFIs with monolingual RDLs versus a multilingual RDL coordinating replicas via a common data format. It introduces Hermes, a CDF-based RDL implemented in Go, JavaScript, and Java, with plug-in extensibility to add features in a single language while preserving multilingual integration. Results show that CDF-based integration yields better software quality and performance (lower latency, memory usage, and higher throughput), and Hermes demonstrates effective cross-language coordination and extensibility through protobuf-based updates and plug-ins. The work provides practical guidance for designing RDLs in multilingual distributed environments and establishes a concrete, extensible reference implementation for future research and development.

Abstract

BACKGROUND: Modern distributed systems replicate data across multiple execution sites. Business requirements and resource constraints often necessitate mixing different languages across replica sites. To facilitate the management of replicated data, modern software engineering practices integrate special-purpose replicated data libraries (RDLs) that provide read-write access to the data and ensure its synchronization. Irrespective of the implementation languages, an RDL typically uses a single language or offers bindings to a designated one. Hence, integrating existing RDLs in multilingual environments requires special-purpose code, whose software quality and performance characteristics are poorly understood. AIMS: We aim to bridge this knowledge gap to understand the software quality and performance characteristics of RDL integration in multilingual environments. METHOD: We conduct an empirical study of two key strategies for integrating RDLs in the context of multilingual replicated data systems: foreign-function interface (FFI) and a common data format (CDF); we measure and compare their respective software metrics and performance to understand their suitability for the task at hand. RESULTS: Our results reveal that adopting CDF for cross-language interaction offers software quality, latency, memory consumption, and throughput advantages. We further validate our findings by (1) creating a CDF-based RDL for mixing compiled, interpreted, and managed languages; and (2) enhancing our RDL with plug-in extensibility that enables adding functionality in a single language while maintaining integration within a multilingual environment. CONCLUSIONS: With modern distributed systems utilizing multiple languages, our findings provide novel insights for designing RDLs in multilingual replicated data systems.

An Empirical Study of Cross-Language Interoperability in Replicated Data Systems

TL;DR

The paper addresses the challenge of cross-language interoperability in replicated data systems by empirically comparing two integration strategies: FFIs with monolingual RDLs versus a multilingual RDL coordinating replicas via a common data format. It introduces Hermes, a CDF-based RDL implemented in Go, JavaScript, and Java, with plug-in extensibility to add features in a single language while preserving multilingual integration. Results show that CDF-based integration yields better software quality and performance (lower latency, memory usage, and higher throughput), and Hermes demonstrates effective cross-language coordination and extensibility through protobuf-based updates and plug-ins. The work provides practical guidance for designing RDLs in multilingual distributed environments and establishes a concrete, extensible reference implementation for future research and development.

Abstract

BACKGROUND: Modern distributed systems replicate data across multiple execution sites. Business requirements and resource constraints often necessitate mixing different languages across replica sites. To facilitate the management of replicated data, modern software engineering practices integrate special-purpose replicated data libraries (RDLs) that provide read-write access to the data and ensure its synchronization. Irrespective of the implementation languages, an RDL typically uses a single language or offers bindings to a designated one. Hence, integrating existing RDLs in multilingual environments requires special-purpose code, whose software quality and performance characteristics are poorly understood. AIMS: We aim to bridge this knowledge gap to understand the software quality and performance characteristics of RDL integration in multilingual environments. METHOD: We conduct an empirical study of two key strategies for integrating RDLs in the context of multilingual replicated data systems: foreign-function interface (FFI) and a common data format (CDF); we measure and compare their respective software metrics and performance to understand their suitability for the task at hand. RESULTS: Our results reveal that adopting CDF for cross-language interaction offers software quality, latency, memory consumption, and throughput advantages. We further validate our findings by (1) creating a CDF-based RDL for mixing compiled, interpreted, and managed languages; and (2) enhancing our RDL with plug-in extensibility that enables adding functionality in a single language while maintaining integration within a multilingual environment. CONCLUSIONS: With modern distributed systems utilizing multiple languages, our findings provide novel insights for designing RDLs in multilingual replicated data systems.

Paper Structure

This paper contains 19 sections, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: A Typical Plug-in Structure
  • Figure 2: Ambient Data: Collect, Persist, and Visualize
  • Figure 3: Cross-Language Integration Strategies
  • Figure 4: Average Latency for Strategy I and II RDLs
  • Figure 5: Average Peak Memory Usage for Strategy I and II RDLs
  • ...and 2 more figures