CARDS: A collection of package, revision, and miscellaneous dependency graphs
Euxane Tran-Girard, Laurent Bulteau, Pierre-Yves David
TL;DR
CARDS addresses the need for a unified, large-scale corpus of dependency graphs spanning package managers, version-control revisions, and other DAGs to enable structural analysis and scalable algorithm benchmarking. The approach collects and harmonizes 11 system-package graphs, 13 language-library graphs, hundreds of millions of revision-graphs, and 45 miscellaneous graphs, encoding them in .deps and .tdag formats while providing both acyclic and cyclic variants. Key contributions include a comprehensive dataset compilation across diverse sources, explicit data formats and processing pipelines, and a topological sort-based method with cycle elimination to support DAG research. The work has practical impact by enabling cross-domain DAG research, benchmarking of reachability and DAG-processing algorithms, and providing openly licensed data and tooling for the community.
Abstract
CARDS (Corpus of Acyclic Repositories and Dependency Systems) is a collection of directed graphs which express dependency relations, extracted from diverse real-world sources such as package managers, version control systems, and event graphs. Each graph contains anywhere from thousands to hundreds of millions of nodes and edges, which are normalized into a simple, unified format. Both cyclic and acyclic variants are included (as some graphs, such as citation networks, are not entirely acyclic). The dataset is suitable for studying the structure of different kinds of dependencies, enabling the characterization and distinction of various dependency graph types. It has been utilized for developing and testing efficient algorithms which leverage the specificities of source version control graphs. The collection is publicly available at doi.org/10.5281/zenodo.14245890.
