Generating CodeMeta through declarative mapping rules: An open-ended approach using ShExML
Herminio García-González
TL;DR
The paper addresses the challenge of producing interoperable, FAIR-compliant metadata for research software across heterogeneous provider formats. It proposes declarative mapping rules in ShExML to generate CodeMeta by merging crosswalks from GitHub, Maven, and Zenodo, followed by JSON-LD framing and validation with SHACL/ShEx. The approach is demonstrated on the ShExML engine and DMAOG, with an automated GitHub Actions workflow that minimizes manual intervention during releases. The work contributes a flexible, automatable CodeMeta generation workflow that can be extended to other projects, promoting broader CodeMeta adoption and improved discoverability of research software.
Abstract
Nowadays, software is one of the cornerstones when conducting research in several scientific fields which employ computer-based methodologies to answer new research questions. However, for these experiments to be completely reproducible, research software should comply with the FAIR principles, yet its metadata can be represented following different data models and spread across different locations. In order to bring some cohesion to the field, CodeMeta was proposed as a vocabulary to represent research software metadata in a unified and standardised manner. While existing tools can help users to generate CodeMeta files for some specific use cases, they fall short on flexibility and adaptability. Hence, in this work, I propose the use of declarative mapping rules to generate CodeMeta files, illustrated through the implementation of three crosswalks in ShExML which are then expanded and merged to cover the generation of CodeMeta files for two existing research software artefacts. Moreover, the outputs are validated using SHACL and ShEx and the whole generation workflow is automated requiring minimal user intervention upon a new version release. This work can, therefore, be used as an example upon which other developers can include a CodeMeta generation workflow in their repositories, facilitating the adoption of CodeMeta and, ultimately, increasing research software FAIRness.
