OmniNova:A General Multimodal Agent Framework

Pengfei Du

OmniNova:A General Multimodal Agent Framework

Pengfei Du

TL;DR

OmniNova tackles the challenge of coordinating multiple LLM-driven agents by introducing a hierarchical, modular framework that routes tasks and allocates LLM capabilities according to cognitive needs. It combines seven specialized agents with a LangGraph-based workflow engine and a multi-layer LLM integration layer to enable efficient tool use, dynamic task routing, and robust task decomposition. Evaluated on 50 complex tasks across research, data analysis, and web interaction domains, OmniNova achieves higher task completion (approximately 87 percent overall), substantially better token efficiency (about 41 percent reduction), and higher human-rated result quality (around 4.2/5) compared with baselines. The work provides both a practical open-source implementation and a theoretical framework for scalable, explainable multi-agent AI systems in automation contexts.

Abstract

The integration of Large Language Models (LLMs) with specialized tools presents new opportunities for intelligent automation systems. However, orchestrating multiple LLM-driven agents to tackle complex tasks remains challenging due to coordination difficulties, inefficient resource utilization, and inconsistent information flow. We present OmniNova, a modular multi-agent automation framework that combines language models with specialized tools such as web search, crawling, and code execution capabilities. OmniNova introduces three key innovations: (1) a hierarchical multi-agent architecture with distinct coordinator, planner, supervisor, and specialist agents; (2) a dynamic task routing mechanism that optimizes agent deployment based on task complexity; and (3) a multi-layered LLM integration system that allocates appropriate models to different cognitive requirements. Our evaluations across 50 complex tasks in research, data analysis, and web interaction domains demonstrate that OmniNova outperforms existing frameworks in task completion rate (87\% vs. baseline 62\%), efficiency (41\% reduced token usage), and result quality (human evaluation score of 4.2/5 vs. baseline 3.1/5). We contribute both a theoretical framework for multi-agent system design and an open-source implementation that advances the state-of-the-art in LLM-based automation systems.

OmniNova:A General Multimodal Agent Framework

TL;DR

Abstract

OmniNova:A General Multimodal Agent Framework

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)