Conversational AI Multi-Agent Interoperability, Universal Open APIs for Agentic Natural Language Multimodal Communications
Diego Gosmar, Deborah A. Dahl, Emmett Coin
TL;DR
The paper addresses fragmentation in conversational AI by proposing OVON, a universal open API framework for agentic natural language multimodal communications. It introduces the Interoperable Conversation Envelope and manifest-based discovery, supported by state-diagram modeling to map inter-agent exchanges across human and AI participants. The main contributions are a technology-agnostic, loosely coupled architecture with open repositories and a Sandbox for experimentation, demonstrated via end-to-end use cases like Smart Errands and Smart Library. It also outlines security, ethics, and accountability enhancements as future work to ensure safe and reliable cross-platform AI collaboration.
Abstract
This paper analyses Conversational AI multi-agent interoperability frameworks and describes the novel architecture proposed by the Open Voice Interoperability initiative (Linux Foundation AI and DATA), also known briefly as OVON (Open Voice Network). The new approach is illustrated, along with the main components, delineating the key benefits and use cases for deploying standard multi-modal AI agency (or agentic AI) communications. Beginning with Universal APIs based on Natural Language, the framework establishes and enables interoperable interactions among diverse Conversational AI agents, including chatbots, voicebots, videobots, and human agents. Furthermore, a new Discovery specification framework is introduced, designed to efficiently look up agents providing specific services and to obtain accurate information about these services through a standard Manifest publication, accessible via an extended set of Natural Language-based APIs. The main purpose of this contribution is to significantly enhance the capabilities and scalability of AI interactions across various platforms. The novel architecture for interoperable Conversational AI assistants is designed to generalize, being replicable and accessible via open repositories.
