Data-Centric Artificial Intelligence
Johannes Jakubik, Michael Vössing, Niklas Kühl, Jannis Walk, Gerhard Satzger
TL;DR
The paper argues that progress in AI has been overly model-centric, underutilizing the potential of systematic data design. It proposes data-centric AI as a complementary paradigm focused on refining and extending data (R1–R6 and E1–E3) to improve model performance plus maintainability, with a detailed framework and implications for Business & Information Systems Engineering across individual, organizational, and cross-organizational levels. Key contributions include clarifying terminology, outlining a two-dimensional data framework, and highlighting practical IS implications, governance needs, and tool support. The work underscores the strategic value of data work, domain knowledge, and data governance in real-world AI deployments and calls for BISE research to advance these practices.
Abstract
Data-centric artificial intelligence (data-centric AI) represents an emerging paradigm emphasizing that the systematic design and engineering of data is essential for building effective and efficient AI-based systems. The objective of this article is to introduce practitioners and researchers from the field of Information Systems (IS) to data-centric AI. We define relevant terms, provide key characteristics to contrast the data-centric paradigm to the model-centric one, and introduce a framework for data-centric AI. We distinguish data-centric AI from related concepts and discuss its longer-term implications for the IS community.
