Foundations and Scoping of Data Science
M. Tamer Özsu
TL;DR
This paper presents an inclusive, end-to-end framing of data science, arguing that it rests on four core pillars—Data Engineering, Data Analytics, Data Protection, and Data Ethics—interacting with diverse application domains. It advocates a process-driven data science lifecycle that emphasizes data preparation, governance, provenance, and responsible deployment, and it anchors system design with reference architectures like NBDRA while recognizing multi-layer security and privacy challenges. By surveying applications across sustainability, energy, biomedicine, health, digital humanities, and finance, the work demonstrates data science' broad reach and the critical need for interoperability and interdisciplinarity. The proposed framework aims to unify the field, guide policy and education, and catalyze cross-domain collaboration to realize data-driven decision making at scale.
Abstract
There has been an increasing recognition of the value of data and of data-based decision making. As a consequence, the development of data science as a field of study has intensified in recent years. However, there is no systematic and comprehensive treatment and understanding of data science. This article describes a systematic and end-to-end framing of the field based on an inclusive definition. It identifies the core components making up the data science ecosystem, presents its lifecycle modeling the development process, and argues its interdisciplinarity.
