Table of Contents
Fetching ...

ReXCL: A Tool for Requirement Document Extraction and Classification

Paheli Bhattacharya, Manojit Chakraborty, Santhosh Kumar Arumugam, Rishabh Gupta

TL;DR

ReXCL tackles the bottleneck of manual extraction and classification in requirement engineering by delivering a two-module pipeline that first schematizes semi-structured requirements and then labels them as Info, Header, Functional, or Non-Functional. The Extraction module combines rule-based and markdown-preserving approaches with a header-footer detector to produce a structured, exportable representation. The Classification module employs adaptive fine-tuning of encoder-based transformers, including domain-adaptive pretraining and supervised fine-tuning, to achieve substantial gains across all classes. Deployed internally, ReXCL demonstrates practical viability with export capabilities to tools like IBM Doors and Jira and supports iteration with user feedback; future work will extend to multi-modal data such as images, tables, and Excel documents.

Abstract

This paper presents the ReXCL tool, which automates the extraction and classification processes in requirement engineering, enhancing the software development lifecycle. The tool features two main modules: Extraction, which processes raw requirement documents into a predefined schema using heuristics and predictive modeling, and Classification, which assigns class labels to requirements using adaptive fine-tuning of encoder-based models. The final output can be exported to external requirement engineering tools. Performance evaluations indicate that ReXCL significantly improves efficiency and accuracy in managing requirements, marking a novel approach to automating the schematization of semi-structured requirement documents.

ReXCL: A Tool for Requirement Document Extraction and Classification

TL;DR

ReXCL tackles the bottleneck of manual extraction and classification in requirement engineering by delivering a two-module pipeline that first schematizes semi-structured requirements and then labels them as Info, Header, Functional, or Non-Functional. The Extraction module combines rule-based and markdown-preserving approaches with a header-footer detector to produce a structured, exportable representation. The Classification module employs adaptive fine-tuning of encoder-based transformers, including domain-adaptive pretraining and supervised fine-tuning, to achieve substantial gains across all classes. Deployed internally, ReXCL demonstrates practical viability with export capabilities to tools like IBM Doors and Jira and supports iteration with user feedback; future work will extend to multi-modal data such as images, tables, and Excel documents.

Abstract

This paper presents the ReXCL tool, which automates the extraction and classification processes in requirement engineering, enhancing the software development lifecycle. The tool features two main modules: Extraction, which processes raw requirement documents into a predefined schema using heuristics and predictive modeling, and Classification, which assigns class labels to requirements using adaptive fine-tuning of encoder-based models. The final output can be exported to external requirement engineering tools. Performance evaluations indicate that ReXCL significantly improves efficiency and accuracy in managing requirements, marking a novel approach to automating the schematization of semi-structured requirement documents.

Paper Structure

This paper contains 9 sections, 2 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The ReXCL tool pipeline. The input is a customer requirement. The extraction module parses the document to produce a structured tabular output. The classification module then classifies each requirement text (row). The final output can then be exported to the tools like IBM Doors.
  • Figure 2: The extraction module workflow; the input is a raw document, and the output is a final structured output containing section number, section heading and section text. The components used are intermediate text representation, header-footer removal, section information extraction and final output generation.
  • Figure 3: Requirement Classification using Adaptive Finetuning. Input is requirement documents with/without class labels. Larger chunk of domain-relevant requirement documents used for extended pretraining using masked language modeling. Smaller chunk with class labels used for task aware finetuning for requirement type classification.
  • Figure 4: Heatmap of annotator scores on a sacle of 0-5
  • Figure 5: ReXCL Tool Overview - Requirement document extracted in the structured format from word documents/PDF. Then the extracted texts from requirement document classified into requirement types : Header, Info, Functional and Non-Functional requirements.