Table of Contents
Fetching ...

Automated Unit Test Improvement using Large Language Models at Meta

Nadia Alshahwan, Jubin Chheda, Anastasia Finegenova, Beliz Gokkaya, Mark Harman, Inna Harper, Alexandru Marginean, Shubho Sengupta, Eddy Wang

TL;DR

The paper presents TestGen-LLM, an Assured Offline LLMSE system that extends existing Kotlin unit tests with automatically generated, verifiably improved test cases. It achieves this through a filtration pipeline (build, pass, and coverage-time guarantees) and an ensemble of LLMs, prompts, and hyperparameters that produce fully formed test-class improvements rather than snippets, thereby avoiding regression. Deployed across Instagram and Facebook test-a-thons, TestGen-LLM demonstrates measurable gains (e.g., 75% builds, 57% reliable passes, 25% coverage increase in evaluation) and lands a notable portion of its recommendations in production (≈73% acceptance). The work argues for the practicality and safety of industrial-scale Assured LLMSE, detailing deployment experiences, quantitative outcomes, qualitative observations, and open research directions for improving automated test enhancement. Overall, the approach blends automated generation with verifiable guarantees to deliver meaningful, trusted code improvements in large-scale software systems."

Abstract

This paper describes Meta's TestGen-LLM tool, which uses LLMs to automatically improve existing human-written tests. TestGen-LLM verifies that its generated test classes successfully clear a set of filters that assure measurable improvement over the original test suite, thereby eliminating problems due to LLM hallucination. We describe the deployment of TestGen-LLM at Meta test-a-thons for the Instagram and Facebook platforms. In an evaluation on Reels and Stories products for Instagram, 75% of TestGen-LLM's test cases built correctly, 57% passed reliably, and 25% increased coverage. During Meta's Instagram and Facebook test-a-thons, it improved 11.5% of all classes to which it was applied, with 73% of its recommendations being accepted for production deployment by Meta software engineers. We believe this is the first report on industrial scale deployment of LLM-generated code backed by such assurances of code improvement.

Automated Unit Test Improvement using Large Language Models at Meta

TL;DR

The paper presents TestGen-LLM, an Assured Offline LLMSE system that extends existing Kotlin unit tests with automatically generated, verifiably improved test cases. It achieves this through a filtration pipeline (build, pass, and coverage-time guarantees) and an ensemble of LLMs, prompts, and hyperparameters that produce fully formed test-class improvements rather than snippets, thereby avoiding regression. Deployed across Instagram and Facebook test-a-thons, TestGen-LLM demonstrates measurable gains (e.g., 75% builds, 57% reliable passes, 25% coverage increase in evaluation) and lands a notable portion of its recommendations in production (≈73% acceptance). The work argues for the practicality and safety of industrial-scale Assured LLMSE, detailing deployment experiences, quantitative outcomes, qualitative observations, and open research directions for improving automated test enhancement. Overall, the approach blends automated generation with verifiable guarantees to deliver meaningful, trusted code improvements in large-scale software systems."

Abstract

This paper describes Meta's TestGen-LLM tool, which uses LLMs to automatically improve existing human-written tests. TestGen-LLM verifies that its generated test classes successfully clear a set of filters that assure measurable improvement over the original test suite, thereby eliminating problems due to LLM hallucination. We describe the deployment of TestGen-LLM at Meta test-a-thons for the Instagram and Facebook platforms. In an evaluation on Reels and Stories products for Instagram, 75% of TestGen-LLM's test cases built correctly, 57% passed reliably, and 25% increased coverage. During Meta's Instagram and Facebook test-a-thons, it improved 11.5% of all classes to which it was applied, with 73% of its recommendations being accepted for production deployment by Meta software engineers. We believe this is the first report on industrial scale deployment of LLM-generated code backed by such assurances of code improvement.
Paper Structure (20 sections, 2 figures, 6 tables)

This paper contains 20 sections, 2 figures, 6 tables.

Figures (2)

  • Figure 1: TestGen-LLM top level architecture (an instance of Assured Offline LLMSE mhetal:intense24-keynote).
  • Figure 2: Sankey diagram showing the filtration process outcomes (as percentages of all test cases) from the Experimental Study on Instagram components for Reels and Stories products, using the four prompt strategies from Table \ref{['tab:prompts']} and the two language models, LLM1 and LLM2.