Table of Contents
Fetching ...

Smarter Together: Creating Agentic Communities of Practice through Shared Experiential Learning

Valentin Tablan, Scott Taylor, Gabriel Hurtado, Kristoffer Bernhem, Anders Uhrenholt, Gabriele Farei, Karo Moilanen

TL;DR

The paper tackles the erosion of traditional developer knowledge-sharing channels in the era of autonomous code agents by proposing Spark, a shared agentic memory that aggregates experiences from coding tasks. Spark employs a knowledge base, a retrieval agent, and a continuous experiential learning loop to provide context-aware recommendations while also letting agents contribute their discoveries back into the memory space. Empirical results on the DS-1000 dataset show that Spark improves code quality across model sizes, with smaller open-weight models matching or surpassing larger state-of-the-art models when aided by Spark, and high perceived usefulness of Spark recommendations. The work demonstrates the practical value of collective, memory-augmented reasoning for software development and outlines a path toward more adaptive, collaborative human-AI coding environments.

Abstract

The transition from human-centric to agent-centric software development practices is disrupting existing knowledge sharing environments for software developers. Traditional peer-to-peer repositories and developer communities for shared technical knowledge and best practice have witnessed dramatic drops in participation in a short period of time. At the same time, agentic functional equivalents are yet to emerge leaving AI agents, which already generate a significant proportion of all new software code produced, without access to repositories of valuable shared learning. In this paper, we introduce Spark, a novel shared agentic memory architecture which is designed to emulate the collective intelligence and know-how of human developer communities. Spark enables AI coding agents to both contribute to and draw from a persistent and continuously evolving experiential memory. Agents operating in the same general problem space use the Spark shared memory as a repository of new knowledge to achieve collective continual learning. We evaluate Spark as a coach for AI coding agents performing software development tasks. We demonstrate that recommendations made by Spark improve the quality of code generated by generic code generation models at varying sizes and capability tiers. Boosted by Spark, a small open-weights model with 30 billion parameters was able to match the code quality afforded by a much larger state-of-the-art model. Separately, we measure the intrinsic quality of recommendations generated by Spark against a wide range of criteria inspired by software development best practice, and achieve helpfulness levels of up to 98.2% in the top two (out of five) qualitative helpfulness bands.

Smarter Together: Creating Agentic Communities of Practice through Shared Experiential Learning

TL;DR

The paper tackles the erosion of traditional developer knowledge-sharing channels in the era of autonomous code agents by proposing Spark, a shared agentic memory that aggregates experiences from coding tasks. Spark employs a knowledge base, a retrieval agent, and a continuous experiential learning loop to provide context-aware recommendations while also letting agents contribute their discoveries back into the memory space. Empirical results on the DS-1000 dataset show that Spark improves code quality across model sizes, with smaller open-weight models matching or surpassing larger state-of-the-art models when aided by Spark, and high perceived usefulness of Spark recommendations. The work demonstrates the practical value of collective, memory-augmented reasoning for software development and outlines a path toward more adaptive, collaborative human-AI coding environments.

Abstract

The transition from human-centric to agent-centric software development practices is disrupting existing knowledge sharing environments for software developers. Traditional peer-to-peer repositories and developer communities for shared technical knowledge and best practice have witnessed dramatic drops in participation in a short period of time. At the same time, agentic functional equivalents are yet to emerge leaving AI agents, which already generate a significant proportion of all new software code produced, without access to repositories of valuable shared learning. In this paper, we introduce Spark, a novel shared agentic memory architecture which is designed to emulate the collective intelligence and know-how of human developer communities. Spark enables AI coding agents to both contribute to and draw from a persistent and continuously evolving experiential memory. Agents operating in the same general problem space use the Spark shared memory as a repository of new knowledge to achieve collective continual learning. We evaluate Spark as a coach for AI coding agents performing software development tasks. We demonstrate that recommendations made by Spark improve the quality of code generated by generic code generation models at varying sizes and capability tiers. Boosted by Spark, a small open-weights model with 30 billion parameters was able to match the code quality afforded by a much larger state-of-the-art model. Separately, we measure the intrinsic quality of recommendations generated by Spark against a wide range of criteria inspired by software development best practice, and achieve helpfulness levels of up to 98.2% in the top two (out of five) qualitative helpfulness bands.

Paper Structure

This paper contains 40 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Quality of code generated by LLMs with vs. without Spark. Quality of code produced by LLMs under the two conditions. 1) NO-SPARK: baseline performance of a given LLM backbone with no access to an external Spark memory. 2) WITH-SPARK: code generation using an external Spark memory populated with raw public software documentation and curated knowledge extracted from synthetic experiential traces. Code quality, judged by an independent LLM judge, is evaluated on 1000 Python data science problems from the DS-1000 data set. NB. the error bars represent standard error. The "Human" data point represents the quality of human-provided reference solutions as per the DS-1000 data set.
  • Figure 2: Distribution of code quality scores for the three codegen models solving coding problems in the DS-1000 data set, as evaluated by Gemini 2.5 Pro as a judge. The distribution shift from lower scores to the maximum score of 5 is visible as access to Spark's memory is made available.