Architecting Data-Intensive Applications : From Data Architecture Design to Its Quality Assurance
Moamin Abughazala
TL;DR
This work tackles the challenge of growing data volumes and varied data sources by proposing an Architecture Description Framework for Data-Intensive Applications (DAF) and a practical toolchain. It introduces DAT, a graphical modeling tool, and PyDaQu, a Python code generator that automates data-quality checks by mapping data architectures to Great Expectations. The study combines qualitative case studies and action research to validate the framework across multiple industrial domains, demonstrating improvements in modeling expressiveness, data-quality monitoring, and workflow efficiency. The findings highlight the framework’s adaptability to data pipelines, data warehouses, and Lambda/Kappa patterns, with significant potential to reduce data-silos and improve decision quality in data-driven organizations.
Abstract
Context - The exponential growth of data is becoming a significant concern. Managing this data has become incredibly challenging, especially when dealing with various sources in different formats and speeds. Moreover, Ensuring data quality has become increasingly crucial for effective decision-making and operational processes. Data Architecture is crucial in describing, collecting, storing, processing, and analyzing data to meet business needs. Providing an abstract view of data-intensive applications is essential to ensure that the data is transformed into valuable information. We must take these challenges seriously to ensure we can effectively manage and use the data to our advantage. Objective - To establish an architecture framework that enables a comprehensive description of the data architecture and effectively streamlines data quality monitoring. Method - The architecture framework utilizes Model Driven Engineering (MDE) techniques. Its backing of data-intensive architecture descriptions empowers with an automated generation for data quality checks. Result - The Framework offers a comprehensive solution for data-intensive applications to model their architecture efficiently and monitor the quality of their data. It automates the entire process and ensures precision and consistency in data. With DAT, architects and analysts gain access to a powerful tool that simplifies their workflow and empowers them to make informed decisions based on reliable data insights. Conclusion - We have evaluated the DAT on more than five cases within various industry domains, demonstrating its exceptional adaptability and effectiveness.
