The Software Observatory: aggregating and analysing software metadata for trend computation and FAIR assessment
Eva Martín del Pico, Josep Lluís Gelpí, Salvador Capella-Gutiérrez
TL;DR
The paper addresses the challenge of fragmented and inconsistent software metadata in Life Sciences by introducing the Software Observatory, a scalable platform that aggregates metadata from diverse registries, enriches and normalises it, and provides automated FAIR assessments through the FAIRsoft Evaluator. Its modular pipeline performs ingestion, EDAM/SPDX harmonisation, external enrichment, and a multi-stage disambiguation process (conservative grouping, conflict detection, rescue heuristics, and LLM-assisted resolution) to produce a deduplicated metadata corpus. The authors demonstrate the approach with a Proteomics case study, revealing strong Findability and licensing gaps, and they discuss how the FAIRsoft Evaluator supports improvement workflows while enabling actionable insights for developers, curators, and policy-makers. The work highlights the Observatory’s potential to guide better software metadata practices, while outlining limitations and future directions such as author disambiguation, document-based metadata mining, and improved visualization of indicator weights, with practical implications for sustainability and FAIR adherence in research software.
Abstract
In the ever-changing realm of research software development, it is crucial for the scientific community to grasp current trends to identify gaps that can potentially hinder scientific progress. The adherence to the FAIR (Findable, Accessible, Interoperable, Reusable) principles can serve as a proxy to understand those trends and provide a mechanism to propose specific actions. The Software Observatory at OpenEBench (https://openebench.bsc.es/observatory) is a novel web portal that consolidates software metadata from various sources, offering comprehensive insights into critical research software aspects. Our platform enables users to analyse trends, identify patterns and advancements within the Life Sciences research software ecosystem, and understand its evolution over time. It also evaluates research software according to FAIR principles for research software, providing scores for different indicators. Users have the ability to visualise this metadata at different levels of granularity, ranging from the entire software landscape to specific communities to individual software entries through the FAIRsoft Evaluator. Indeed, the FAIRsoft Evaluator component streamlines the assessment process, helping developers efficiently evaluate and obtain guidance to improve their software's FAIRness. The Software Observatory represents a valuable resource for researchers and software developers, as well as stakeholders, promoting better software development practices and adherence to FAIR principles for research software.
