Table of Contents
Fetching ...

Foundations and Scoping of Data Science

M. Tamer Özsu

TL;DR

This paper presents an inclusive, end-to-end framing of data science, arguing that it rests on four core pillars—Data Engineering, Data Analytics, Data Protection, and Data Ethics—interacting with diverse application domains. It advocates a process-driven data science lifecycle that emphasizes data preparation, governance, provenance, and responsible deployment, and it anchors system design with reference architectures like NBDRA while recognizing multi-layer security and privacy challenges. By surveying applications across sustainability, energy, biomedicine, health, digital humanities, and finance, the work demonstrates data science' broad reach and the critical need for interoperability and interdisciplinarity. The proposed framework aims to unify the field, guide policy and education, and catalyze cross-domain collaboration to realize data-driven decision making at scale.

Abstract

There has been an increasing recognition of the value of data and of data-based decision making. As a consequence, the development of data science as a field of study has intensified in recent years. However, there is no systematic and comprehensive treatment and understanding of data science. This article describes a systematic and end-to-end framing of the field based on an inclusive definition. It identifies the core components making up the data science ecosystem, presents its lifecycle modeling the development process, and argues its interdisciplinarity.

Foundations and Scoping of Data Science

TL;DR

This paper presents an inclusive, end-to-end framing of data science, arguing that it rests on four core pillars—Data Engineering, Data Analytics, Data Protection, and Data Ethics—interacting with diverse application domains. It advocates a process-driven data science lifecycle that emphasizes data preparation, governance, provenance, and responsible deployment, and it anchors system design with reference architectures like NBDRA while recognizing multi-layer security and privacy challenges. By surveying applications across sustainability, energy, biomedicine, health, digital humanities, and finance, the work demonstrates data science' broad reach and the critical need for interoperability and interdisciplinarity. The proposed framework aims to unify the field, guide policy and education, and catalyze cross-domain collaboration to realize data-driven decision making at scale.

Abstract

There has been an increasing recognition of the value of data and of data-based decision making. As a consequence, the development of data science as a field of study has intensified in recent years. However, there is no systematic and comprehensive treatment and understanding of data science. This article describes a systematic and end-to-end framing of the field based on an inclusive definition. It identifies the core components making up the data science ecosystem, presents its lifecycle modeling the development process, and argues its interdisciplinarity.
Paper Structure (52 sections, 19 figures)

This paper contains 52 sections, 19 figures.

Figures (19)

  • Figure 1: Trending of data science-relevant terms Source: Google Trends on January 2023
  • Figure 2: Data Science Word Cloud
  • Figure 3: Data Science Building Blocks.
  • Figure 4: Alternative integration architectures
  • Figure 5: Four Types of Data Analytics and their Relationships. From Maydon07
  • ...and 14 more figures