Table of Contents
Fetching ...

Semi-Automated Design of Data-Intensive Architectures

Arianna Dragoni, Alessandro Margara

TL;DR

The paper tackles the challenge of designing data-intensive architectures in the presence of diverse data formats, rates, and users. It proposes a scenario-to-architecture methodology based on a Scenario Description Language (SDL) and an Architecture Description Language (ADL), complemented by a system taxonomy to automatically derive architectures and map them to concrete systems. Key contributions include the SDL, the ADL, a cost-based optimization for selecting component types (state-centric, batch, stream) and link persistency, and empirical validation on reference architectures and a Facebook use case. The approach provides explicit rationale for architectural decisions and supports interactive exploration of design trade-offs, with demonstrated scalability to complex scenarios.

Abstract

Today, data guides the decision-making process of most companies. Effectively analyzing and manipulating data at scale to extract and exploit relevant knowledge is a challenging task, due to data characteristics such as its size, the rate at which it changes, and the heterogeneity of formats. To address this challenge, software architects resort to build complex data-intensive architectures that integrate highly heterogeneous software systems, each offering vertically specialized functionalities. Designing a suitable architecture for the application at hand is crucial to enable high quality of service and efficient exploitation of resources. However, the design process entails a series of decisions that demand technical expertise and in-depth knowledge of individual systems and their synergies. To assist software architects in this task, this paper introduces a development methodology for data-intensive architectures, which guides architects in (i) designing a suitable architecture for their specific application scenario, and (ii) selecting an appropriate set of concrete systems to implement the application. To do so, the methodology grounds on (1) a language to precisely define an application scenario in terms of characteristics of data and requirements of stakeholders; (2) an architecture description language for data-intensive architectures; (3) a classification of systems based on the functionalities they offer and their performance trade-offs. We show that the description languages we adopt can capture the key aspects of data-intensive architectures proposed by researchers and practitioners, and we validate our methodology by applying it to real-world case studies documented in literature.

Semi-Automated Design of Data-Intensive Architectures

TL;DR

The paper tackles the challenge of designing data-intensive architectures in the presence of diverse data formats, rates, and users. It proposes a scenario-to-architecture methodology based on a Scenario Description Language (SDL) and an Architecture Description Language (ADL), complemented by a system taxonomy to automatically derive architectures and map them to concrete systems. Key contributions include the SDL, the ADL, a cost-based optimization for selecting component types (state-centric, batch, stream) and link persistency, and empirical validation on reference architectures and a Facebook use case. The approach provides explicit rationale for architectural decisions and supports interactive exploration of design trade-offs, with demonstrated scalability to complex scenarios.

Abstract

Today, data guides the decision-making process of most companies. Effectively analyzing and manipulating data at scale to extract and exploit relevant knowledge is a challenging task, due to data characteristics such as its size, the rate at which it changes, and the heterogeneity of formats. To address this challenge, software architects resort to build complex data-intensive architectures that integrate highly heterogeneous software systems, each offering vertically specialized functionalities. Designing a suitable architecture for the application at hand is crucial to enable high quality of service and efficient exploitation of resources. However, the design process entails a series of decisions that demand technical expertise and in-depth knowledge of individual systems and their synergies. To assist software architects in this task, this paper introduces a development methodology for data-intensive architectures, which guides architects in (i) designing a suitable architecture for their specific application scenario, and (ii) selecting an appropriate set of concrete systems to implement the application. To do so, the methodology grounds on (1) a language to precisely define an application scenario in terms of characteristics of data and requirements of stakeholders; (2) an architecture description language for data-intensive architectures; (3) a classification of systems based on the functionalities they offer and their performance trade-offs. We show that the description languages we adopt can capture the key aspects of data-intensive architectures proposed by researchers and practitioners, and we validate our methodology by applying it to real-world case studies documented in literature.

Paper Structure

This paper contains 16 sections, 5 equations, 7 figures.

Figures (7)

  • Figure 1: Overview of the methodology.
  • Figure 2: UML class diagram for the Scenario Description Language (SDL) and the Architecture Description Language (ADL).
  • Figure 3: Scenario (top) and architecture (bottom) descriptions for reference architectures.
  • Figure 4: Facebook use case: scenario description.
  • Figure 5: Facebook use case: components for each data flows.
  • ...and 2 more figures