What is a Digital Twin Anyway? Deriving the Definition for the Built Environment from over 15,000 Scientific Publications

Mahmoud Abdelrahman; Edgardo Macatulad; Binyu Lei; Matias Quintana; Clayton Miller; Filip Biljecki

What is a Digital Twin Anyway? Deriving the Definition for the Built Environment from over 15,000 Scientific Publications

Mahmoud Abdelrahman, Edgardo Macatulad, Binyu Lei, Matias Quintana, Clayton Miller, Filip Biljecki

TL;DR

This paper tackles the lack of a unified DT definition for the built environment by conducting a large-scale NLP-based analysis of over 15,000 publications and integrating insights from a 52-expert Delphi survey. It identifies domain-specific DT components, reveals a fundamental LTDS vs. HPRT dichotomy, and derives BE-focused definitions for Building/Architecture DTs and Urban/City DTs, anchored by CITYSTEPS maturation levels. The approach demonstrates significant cross-domain variation and shows that BE DTs are generally long-term decision-support systems with limited real-time AI/ML maturity to date. By providing a data-driven, consensus-building framework and open methodology, the work offers practical guidance for BE practitioners and standards and sets a foundation for periodic re-definition as the field evolves.

Abstract

The concept of digital twins has attracted significant attention across various domains, particularly within the built environment. However, there is a sheer volume of definitions and the terminological consensus remains out of reach. The lack of a universally accepted definition leads to ambiguities in their conceptualization and implementation, and may cause miscommunication for both researchers and practitioners. We employed Natural Language Processing (NLP) techniques to systematically extract and analyze definitions of digital twins from a corpus of more than 15,000 full-text articles spanning diverse disciplines. The study compares these findings with insights from an expert survey that included 52 experts. The study identifies concurrence on the components that comprise a ``Digital Twin'' from a practical perspective across various domains, contrasting them with those that do not, to identify deviations. We investigate the evolution of digital twin definitions over time and across different scales, including manufacturing, building, and urban/geospatial perspectives. We extracted the main components of Digital Twins using Text Frequency Analysis and N-gram analysis. Subsequently, we identified components that appeared in the literature and conducted a Chi-square test to assess the significance of each component in different domains. Our analysis identified key components of digital twins and revealed significant variations in definitions based on application domains, such as manufacturing, building, and urban contexts. The analysis of DT components reveal two major groups of DT types: High-Performance Real-Time (HPRT) DTs, and Long-Term Decision Support (LTDS) DTs. Contrary to common assumptions, we found that components such as simulation, AI/ML, real-time capabilities, and bi-directional data flow are not yet fully mature in the digital twins of the built environment.

What is a Digital Twin Anyway? Deriving the Definition for the Built Environment from over 15,000 Scientific Publications

TL;DR

Abstract

Paper Structure (16 sections, 2 equations, 13 figures, 2 tables)

This paper contains 16 sections, 2 equations, 13 figures, 2 tables.

Introduction
Related work
Methodology
Sample Development
Full-text articles dataset
The expert survey dataset
The definitions dataset
Digital Twin components identification
Component Identification and Analysis
Results
Component Analysis Across Domains
Derived Definitions and Implications
Building and Architecture Digital Twin (BDT and ADT) definitions
Urban and City Digital Twin (UDTs and CDTs) definitions
Discussion and Future Directions
...and 1 more sections

Figures (13)

Figure 1: Illustration of commonly discussed Digital Twin components.
Figure 2: Our novel approach to infer a DT definition consists of deriving significance of each DT component from literature using Natural Language Processing (NLP) and statistical analysis.
Figure 3: Overview of the methodology.
Figure 4: The flowchart represents the definition of extraction and clustering methods.
Figure 5: Most common publication sources in the dataset, spanning multiple communities, scales, and domains.
...and 8 more figures

Theorems & Definitions (2)

Definition 1
Definition 2

What is a Digital Twin Anyway? Deriving the Definition for the Built Environment from over 15,000 Scientific Publications

TL;DR

Abstract

What is a Digital Twin Anyway? Deriving the Definition for the Built Environment from over 15,000 Scientific Publications

Authors

TL;DR

Abstract

Table of Contents

Figures (13)

Theorems & Definitions (2)