Table of Contents
Fetching ...

EmbC-Test: How to Speed Up Embedded Software Testing Using LLMs and RAG

Maximilian Harnot, Sebastian Komarnicki, Michal Polok, Timo Oksanen

TL;DR

A Retrieval-Augmented Generation (RAG) pipeline is presented as a solution for partial automation of the verification process by grounding a large language model in project-specific artifacts, which reduces hallucinations and improves project alignment.

Abstract

Manual development of automatic tests for embedded C software is a strenuous and time-consuming task that does not scale well. With the accelerating pace of software release cycles, verification increasingly becomes the bottleneck in the embedded development workflow. This paper presents a Retrieval-Augmented Generation (RAG) pipeline as a solution for partial automation of the verification process. By grounding a large language model in project-specific artifacts, the approach reduces hallucinations and improves project alignment. An industrial evaluation showed that the generated tests are 100 % syntactically correct, with 85 % successfully passing runtime validation. The proposed solution has the potential to save up to 66 % of the testing time compared to manual test writing while generating 270 tests per hour.

EmbC-Test: How to Speed Up Embedded Software Testing Using LLMs and RAG

TL;DR

A Retrieval-Augmented Generation (RAG) pipeline is presented as a solution for partial automation of the verification process by grounding a large language model in project-specific artifacts, which reduces hallucinations and improves project alignment.

Abstract

Manual development of automatic tests for embedded C software is a strenuous and time-consuming task that does not scale well. With the accelerating pace of software release cycles, verification increasingly becomes the bottleneck in the embedded development workflow. This paper presents a Retrieval-Augmented Generation (RAG) pipeline as a solution for partial automation of the verification process. By grounding a large language model in project-specific artifacts, the approach reduces hallucinations and improves project alignment. An industrial evaluation showed that the generated tests are 100 % syntactically correct, with 85 % successfully passing runtime validation. The proposed solution has the potential to save up to 66 % of the testing time compared to manual test writing while generating 270 tests per hour.
Paper Structure (11 sections, 9 figures, 1 table)

This paper contains 11 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: AI-assisted software testing ecosystem at Hydac Software
  • Figure 2: UMAP visualization of the embedding space for fixed-size chunking (1446 chunks). The clusters overlap due to arbitrary split boundaries.
  • Figure 3: UMAP visualization of the embedding space for AST-based chunking (833 chunks). In AST-based chunking, the clusters are grouped more tightly together.
  • Figure 4: Test generation prompt structure.
  • Figure 5: Comparison of the best-performing RAG configuration and manual test baseline.
  • ...and 4 more figures