ReXCL: A Tool for Requirement Document Extraction and Classification
Paheli Bhattacharya, Manojit Chakraborty, Santhosh Kumar Arumugam, Rishabh Gupta
TL;DR
ReXCL tackles the bottleneck of manual extraction and classification in requirement engineering by delivering a two-module pipeline that first schematizes semi-structured requirements and then labels them as Info, Header, Functional, or Non-Functional. The Extraction module combines rule-based and markdown-preserving approaches with a header-footer detector to produce a structured, exportable representation. The Classification module employs adaptive fine-tuning of encoder-based transformers, including domain-adaptive pretraining and supervised fine-tuning, to achieve substantial gains across all classes. Deployed internally, ReXCL demonstrates practical viability with export capabilities to tools like IBM Doors and Jira and supports iteration with user feedback; future work will extend to multi-modal data such as images, tables, and Excel documents.
Abstract
This paper presents the ReXCL tool, which automates the extraction and classification processes in requirement engineering, enhancing the software development lifecycle. The tool features two main modules: Extraction, which processes raw requirement documents into a predefined schema using heuristics and predictive modeling, and Classification, which assigns class labels to requirements using adaptive fine-tuning of encoder-based models. The final output can be exported to external requirement engineering tools. Performance evaluations indicate that ReXCL significantly improves efficiency and accuracy in managing requirements, marking a novel approach to automating the schematization of semi-structured requirement documents.
