Table of Contents
Fetching ...

SENAI: Towards Software Engineering Native Generative Artificial Intelligence

Mootez Saad, José Antonio Hernández López, Boqi Chen, Neil Ernst, Dániel Varró, Tushar Sharma

TL;DR

The paper argues that current large language models for code optimize functional correctness but neglect Software Engineering design principles such as modularity, cohesion, and coupling. It outlines the Status Quo of pre-training objectives and data representations, highlighting gaps between code-generation capabilities and SE design reasoning, and reviews token-based and execution-based benchmarks like HumanEval and pass@k. A vision is set for Software Engineering Grounded Language Models that leverage SE-centric evaluation (via Bloom's taxonomy) and training (SE-guided multimodal data including UML diagrams and ADRs) to internalize architectural reasoning. The proposed approach emphasizes repository-level context, architectural awareness, and RL-guided alignment to produce code that is not only correct but maintainable, extensible, and well-designed, with practical implications for reducing technical debt. Collectively, the work foregrounds a path toward SE-native generative models and calls for revamped benchmarks that assess design quality alongside functional correctness.

Abstract

Large Language Models have significantly advanced the field of code generation, demonstrating the ability to produce functionally correct code snippets. However, advancements in generative AI for code overlook foundational Software Engineering (SE) principles such as modularity, and single responsibility, and concepts such as cohesion and coupling which are critical for creating maintainable, scalable, and robust software systems. These concepts are missing in pipelines that start with pre-training and end with the evaluation using benchmarks. This vision paper argues for the integration of SE knowledge into LLMs to enhance their capability to understand, analyze, and generate code and other SE artifacts following established SE knowledge. The aim is to propose a new direction where LLMs can move beyond mere functional accuracy to perform generative tasks that require adherence to SE principles and best practices. In addition, given the interactive nature of these conversational models, we propose using Bloom's Taxonomy as a framework to assess the extent to which they internalize SE knowledge. The proposed evaluation framework offers a sound and more comprehensive evaluation technique compared to existing approaches such as linear probing. Software engineering native generative models will not only overcome the shortcomings present in current models but also pave the way for the next generation of generative models capable of handling real-world software engineering.

SENAI: Towards Software Engineering Native Generative Artificial Intelligence

TL;DR

The paper argues that current large language models for code optimize functional correctness but neglect Software Engineering design principles such as modularity, cohesion, and coupling. It outlines the Status Quo of pre-training objectives and data representations, highlighting gaps between code-generation capabilities and SE design reasoning, and reviews token-based and execution-based benchmarks like HumanEval and pass@k. A vision is set for Software Engineering Grounded Language Models that leverage SE-centric evaluation (via Bloom's taxonomy) and training (SE-guided multimodal data including UML diagrams and ADRs) to internalize architectural reasoning. The proposed approach emphasizes repository-level context, architectural awareness, and RL-guided alignment to produce code that is not only correct but maintainable, extensible, and well-designed, with practical implications for reducing technical debt. Collectively, the work foregrounds a path toward SE-native generative models and calls for revamped benchmarks that assess design quality alongside functional correctness.

Abstract

Large Language Models have significantly advanced the field of code generation, demonstrating the ability to produce functionally correct code snippets. However, advancements in generative AI for code overlook foundational Software Engineering (SE) principles such as modularity, and single responsibility, and concepts such as cohesion and coupling which are critical for creating maintainable, scalable, and robust software systems. These concepts are missing in pipelines that start with pre-training and end with the evaluation using benchmarks. This vision paper argues for the integration of SE knowledge into LLMs to enhance their capability to understand, analyze, and generate code and other SE artifacts following established SE knowledge. The aim is to propose a new direction where LLMs can move beyond mere functional accuracy to perform generative tasks that require adherence to SE principles and best practices. In addition, given the interactive nature of these conversational models, we propose using Bloom's Taxonomy as a framework to assess the extent to which they internalize SE knowledge. The proposed evaluation framework offers a sound and more comprehensive evaluation technique compared to existing approaches such as linear probing. Software engineering native generative models will not only overcome the shortcomings present in current models but also pave the way for the next generation of generative models capable of handling real-world software engineering.

Paper Structure

This paper contains 17 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Overview of the vision of seNAI, incorporating software engineering knowledge into the training and evaluation process of large language models.