Experimenting with Multi-Agent Software Development: Towards a Unified Platform
Malik Abdul Sami, Muhammad Waseem, Zeeshan Rasheed, Mika Saari, Kari Systä, Pekka Abrahamsson
TL;DR
This work tackles the challenge of delivering a cohesive, AI-assisted software development lifecycle by introducing a multi-agent platform that orchestration requirements-to-deliverables across SDLC stages. It leverages specialized agents for requirements engineering, architecture, code generation, testing, data analysis, and compliance, with prompt engineering as a central enabling mechanism. Key contributions include automated generation and prioritization of user stories, PlantUML-based UML design from requirements, modular Python/React code generation, automated test creation, and EU security/compliance support, all demonstrated via preliminary results and GitHub-hosted source. The study suggests that such AI-driven orchestration can accelerate development, improve consistency, and provide governance, while highlighting the need for model diversification and broader validation across domains.
Abstract
Large language models are redefining software engineering by implementing AI-powered techniques throughout the whole software development process, including requirement gathering, software architecture, code generation, testing, and deployment. However, it is still difficult to develop a cohesive platform that consistently produces the best outcomes across all stages. The objective of this study is to develop a unified platform that utilizes multiple artificial intelligence agents to automate the process of transforming user requirements into well-organized deliverables. These deliverables include user stories, prioritization, and UML sequence diagrams, along with the modular approach to APIs, unit tests, and end-to-end tests. Additionally, the platform will organize tasks, perform security and compliance, and suggest design patterns and improvements for non-functional requirements. We allow users to control and manage each phase according to their preferences. In addition, the platform provides security and compliance checks following European standards and proposes design optimizations. We use multiple models, such as GPT-3.5, GPT-4, and Llama3 to enable to generation of modular code as per user choice. The research also highlights the limitations and future research discussions to overall improve the software development life cycle. The source code for our uniform platform is hosted on GitHub, enabling additional experimentation and supporting both research and practical uses. \end
