Towards Defect Phase Diagrams: From Research Data Management to Automated Workflows
Khalil Rejiba, Sang-Hyeok Lee, Christina Gasper, Martina Freund, Sandra Korte-Kerzel, Ulrich Kerzel
TL;DR
Defect phase diagrams map the most stable defect states as a function of chemical potential $\mu$ and enable materials design at the atomic scale. The paper presents an integrated RDM infrastructure that connects heterogeneous experimental and simulation data across distributed groups via openBIS as an electronic laboratory notebook–laboratory information management system core and a companion application for cloud storage, automated metadata extraction, and provenance visualization. Key contributions include tailored openBIS schemas for samples, instruments, and protocols; QR-code sample tracking; extended provenance graphs; metadata parsers for vendor formats; and automated reports and baseline analyses that improve reproducibility. This approach accelerates the construction of defect phase diagrams and supports end-to-end traceability and reuse across institutions.
Abstract
Defect phase diagrams provide a unified description of crystal defect states for materials design and are central to the scientific objectives of the Collaborative Research Centre (CRC) 1394. Their construction requires the systematic integration of heterogeneous experimental and simulation data across research groups and locations. In this setting, research data management (RDM) is a key enabler of new scientific insight by linking distributed research activities and making complex data reproducible and reusable. To address the challenge of heterogeneous data sources and formats, a comprehensive RDM infrastructure has been established that links experiment, data, and analysis in a seamless workflow. The system combines: (1) a joint electronic laboratory notebook and laboratory information management system, (2) easy-to-use large-object data storage, (3) automatic metadata extraction from heterogeneous and proprietary file formats, (4) interactive provenance graphs for data exploration and reuse, and (5) automated reporting and analysis workflows. The two key technological elements are the openBIS electronic laboratory notebook and laboratory information management system, and a newly developed companion application that extends openBIS with large-scale data handling, automated metadata capture, and federated access to distributed research data. This integrated approach reduces friction in data capture and curation, enabling traceable and reusable datasets that accelerate the construction of defect phase diagrams across institutions.
