Mining Architectural Information: A Systematic Mapping Study
Musengamana Jean de Dieu, Peng Liang, Mojtaba Shahin, Chen Yang, Zengyang Li
TL;DR
The paper addresses the fragmentation of literature on mining architectural information from software repositories by performing a systematic mapping of 104 primary studies (2006–2022). It classifies mined architectural information into seven categories, identifies eleven information sources, maps eleven architecting activities that can be supported, and inventories 95 mining approaches with 56 tools, while delineating four key challenges. The study finds that architectural descriptions are the most mined information, while architecture understanding is the most supported activity, and emphasizes the predominance of classification-based, automatic approaches though many approaches lack tooling support. The authors discuss implications for researchers and practitioners, advocate for open benchmark datasets, and propose directions such as neural information retrieval and LLM-assisted tooling to enhance industrial relevance and practical adoption. Overall, the work provides a comprehensive landscape of what is mined, from where, and with which methods, enabling targeted future research and more informed practice in software architecture mining.
Abstract
Mining Software Repositories (MSR) has become an essential activity in software development. Mining architectural information to support architecting activities, such as architecture understanding, has received significant attention in recent years. However, there is a lack of clarity on what literature on mining architectural information is available. Consequently, this may create difficulty for practitioners to understand and adopt the state-of-the-art research results, such as what approaches should be adopted to mine what architectural information in order to support architecting activities. It also hinders researchers from being aware of the challenges and remedies for the identified research gaps. We aim to identify, analyze, and synthesize the literature on mining architectural information in terms of architectural information and sources mined, architecting activities supported, approaches and tools used, and challenges faced. An SMS has been conducted on the literature published between January 2006 and December 2022. Of the 104 primary studies selected, 7 categories of architectural information have been mined, among which architectural description is the most mined architectural information; 11 categories of sources have been leveraged for mining architectural information, among which version control system is the most popular source; 11 architecting activities can be supported by the mined architectural information, among which architecture understanding is the most supported activity; 95 approaches and 56 tools were proposed and employed in mining architectural information; and 4 types of challenges in mining architectural information were identified. This SMS provides researchers with future directions and help practitioners be aware of what approaches and tools can be used to mine what architectural information from what sources to support various architecting activities.
