Can we measure the impact of a database?
Peter Buneman, Dennis Dosso, Matteo Lissandrini, Gianmaria Silvello, He Sun
TL;DR
This work extends the $h$-index to hierarchical database structures, enabling measurement of database impact via antichains to prevent double-counting and a polynomial-time algorithm to compute the index. By applying the method to DrugBank, GtoPdb, and NCBI Taxonomy, it shows that the hierarchy-based index can exceed flat, leaf-only counts and that transformations like lifting can further increase the measure when appropriate. The approach provides a principled way to credit curators and contributors through structured decompositions, while highlighting that data citation practice and classification schemes remain evolving challenges. Overall, the paper establishes a practical, scalable framework for quantifying database impact with potential implications for data crediting and curation.
Abstract
In disseminating scientific and statistical data, on-line databases have almost completely replaced traditional paper-based media such as journals and reference works. Given this, can we measure the impact of a database in the same way that we measure an author's or journal's impact? To do this, we need somehow to represent a database as a set of publications, and databases typically allow a large number of possible decompositions into parts, any of which could be treated as a publication. We show that the definition of the h-index naturally extends to hierarchies, so that if a database admits some kind of hierarchical interpretation we can use this as one measure of the importance of a database; moreover, this can be computed as efficiently as one can compute the normal h-index. This also gives us a decomposition of the database that might be used for other purposes such as giving credit to the curators or contributors to the database. We illustrate the process by analyzing three widely used databases.
