Table of Contents
Fetching ...

Advancing Software Quality: A Standards-Focused Review of LLM-Based Assurance Techniques

Avinash Patil

TL;DR

This paper addresses the challenge of integrating AI-driven LLM techniques with established SQA standards to ensure compliant, reliable software. It systematically surveys foundational standards (ISO/IEC 12207, 25010, 5055, ISO 9001/90003, CMMI, TMM), presents a technical overview of LLM capabilities and tasks in software engineering, and maps AI-enabled SQA approaches to these standards. The work contributes a comprehensive alignment framework, empirical case studies, and governance considerations, highlighting data privacy, bias, explainability, and auditing as core concerns. The findings offer practical guidance for deploying LLM-assisted SQA within regulatory frameworks and suggest future directions including adaptive learning, multimodal analysis, and AI-standard evolution to realize scalable, standards-compliant AI QA workflows.

Abstract

Software Quality Assurance (SQA) is critical for delivering reliable, secure, and efficient software products. The Software Quality Assurance Process aims to provide assurance that work products and processes comply with predefined provisions and plans. Recent advancements in Large Language Models (LLMs) present new opportunities to enhance existing SQA processes by automating tasks like requirement analysis, code review, test generation, and compliance checks. Simultaneously, established standards such as ISO/IEC 12207, ISO/IEC 25010, ISO/IEC 5055, ISO 9001/ISO/IEC 90003, CMMI, and TMM provide structured frameworks for ensuring robust quality practices. This paper surveys the intersection of LLM-based SQA methods and these recognized standards, highlighting how AI-driven solutions can augment traditional approaches while maintaining compliance and process maturity. We first review the foundational software quality standards and the technical fundamentals of LLMs in software engineering. Next, we explore various LLM-based SQA applications, including requirement validation, defect detection, test generation, and documentation maintenance. We then map these applications to key software quality frameworks, illustrating how LLMs can address specific requirements and metrics within each standard. Empirical case studies and open-source initiatives demonstrate the practical viability of these methods. At the same time, discussions on challenges (e.g., data privacy, model bias, explainability) underscore the need for deliberate governance and auditing. Finally, we propose future directions encompassing adaptive learning, privacy-focused deployments, multimodal analysis, and evolving standards for AI-driven software quality.

Advancing Software Quality: A Standards-Focused Review of LLM-Based Assurance Techniques

TL;DR

This paper addresses the challenge of integrating AI-driven LLM techniques with established SQA standards to ensure compliant, reliable software. It systematically surveys foundational standards (ISO/IEC 12207, 25010, 5055, ISO 9001/90003, CMMI, TMM), presents a technical overview of LLM capabilities and tasks in software engineering, and maps AI-enabled SQA approaches to these standards. The work contributes a comprehensive alignment framework, empirical case studies, and governance considerations, highlighting data privacy, bias, explainability, and auditing as core concerns. The findings offer practical guidance for deploying LLM-assisted SQA within regulatory frameworks and suggest future directions including adaptive learning, multimodal analysis, and AI-standard evolution to realize scalable, standards-compliant AI QA workflows.

Abstract

Software Quality Assurance (SQA) is critical for delivering reliable, secure, and efficient software products. The Software Quality Assurance Process aims to provide assurance that work products and processes comply with predefined provisions and plans. Recent advancements in Large Language Models (LLMs) present new opportunities to enhance existing SQA processes by automating tasks like requirement analysis, code review, test generation, and compliance checks. Simultaneously, established standards such as ISO/IEC 12207, ISO/IEC 25010, ISO/IEC 5055, ISO 9001/ISO/IEC 90003, CMMI, and TMM provide structured frameworks for ensuring robust quality practices. This paper surveys the intersection of LLM-based SQA methods and these recognized standards, highlighting how AI-driven solutions can augment traditional approaches while maintaining compliance and process maturity. We first review the foundational software quality standards and the technical fundamentals of LLMs in software engineering. Next, we explore various LLM-based SQA applications, including requirement validation, defect detection, test generation, and documentation maintenance. We then map these applications to key software quality frameworks, illustrating how LLMs can address specific requirements and metrics within each standard. Empirical case studies and open-source initiatives demonstrate the practical viability of these methods. At the same time, discussions on challenges (e.g., data privacy, model bias, explainability) underscore the need for deliberate governance and auditing. Finally, we propose future directions encompassing adaptive learning, privacy-focused deployments, multimodal analysis, and evolving standards for AI-driven software quality.

Paper Structure

This paper contains 108 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Number of papers published per year from 2023 to early 2025, showing a rise in 2024 as interest in LLMs for software quality assurance surged.
  • Figure 2: Distribution of dataset themes used in the surveyed literature. A significant portion of papers did not specify any dataset, while open-source projects and benchmark suites were most common among those that did.
  • Figure 3: Frequency of evaluation approaches used in the papers. Comparative studies, empirical/user evaluations, and automated performance metrics dominated the landscape.
  • Figure 4: Proportion of papers that used fine-tuned LLMs versus those that relied solely on pre-trained models. Most studies avoided fine-tuning.
  • Figure 5: Distribution of LLMs reported in the literature. GPT-4, GPT-3.5, and ChatGPT were the most commonly used, though many papers did not specify the model used.
  • ...and 1 more figures