RefExpo: Unveiling Software Project Structures through Advanced Dependency Graph Extraction
Vahid Haratian, Pouria Derakhshanfar, Vladimir Kovalenko, Eray Tüzün
TL;DR
RefExpo addresses the lack of reusable dependency-graph extraction tools by delivering an IntelliJ-based, multi-language DG extractor. It employs a two-pass PSI/AST traversal to identify internal references and outputs a CSV edge list with File/Class/Method locators, enabling scalable DG analysis. The authors also provide a 20-project Java/Python dataset to support reproducibility and cross-tool evaluation, validating RefExpo with micro-level recall and macro-level comparisons against existing tools. Results show strong recall on micro benchmarks and substantial edge coverage and overlap at the macro level, underscoring RefExpo's practical impact for software analytics and DG research. The work lays groundwork for broader language support and more extensive DG datasets.
Abstract
The dependency graph (DG) of a software project offers valuable insights for identifying its key components and has been leveraged in numerous studies. However, there is a lack of reusable tools for DG extraction. Existing tools are either outdated and difficult to configure or fail to provide accurate analysis. This study introduces RefExpo, a reusable DG extraction tool that supports multiple languages such as Java, Python, and JavaScript. RefExpo is a plugin based on IntelliJ, a well-maintained and reputed IDE. We also compile an initial version of our dataset, consisting of 20 Java and Python projects. RefExpo's validity is evaluated at two levels: specific language features and comparisons against other tools, referred to as micro and macro levels. Our results show RefExpo achieves 92\% and 100\% recall on micro test suites Judge and PyCG for Python and Java, respectively. In macro-level experiments, RefExpo outperformed existing tools by 31\% and 7\% in finding unique and shared results. The installable version of RefExpo is available on the IntelliJ marketplace, and a short video describing its functionality is available on YouTube.
