Position: Agentic Systems Constitute a Key Component of Next-Generation Intelligent Image Processing
Jinjin Gu
TL;DR
This paper identifies fundamental limitations of purely model-centric image processing in generality, adaptability, and real-world problem solving, and argues for a paradigm shift toward agentic image processing systems that orchestrate multiple tools. It introduces a framework where agents leverage LLMs and modular components to plan, execute, reflect, and adapt processing workflows, with a six-level taxonomy of agentic capability. The authors outline core design principles, cognitive architectures, knowledge infusion strategies, and human–computer interaction requirements, providing guidance for future research. The work emphasizes that agentic, tool-coordinating systems can achieve greater autonomy, generality, and creativity, with practical implications for more robust and versatile image processing solutions.
Abstract
This position paper argues that the image processing community should broaden its focus from purely model-centric development to include agentic system design as an essential complementary paradigm. While deep learning has significantly advanced capabilities for specific image processing tasks, current approaches face critical limitations in generalization, adaptability, and real-world problem-solving flexibility. We propose that developing intelligent agentic systems, capable of dynamically selecting, combining, and optimizing existing image processing tools, represents the next evolutionary step for the field. Such systems would emulate human experts' ability to strategically orchestrate different tools to solve complex problems, overcoming the brittleness of monolithic models. The paper analyzes key limitations of model-centric paradigms, establishes design principles for agentic image processing systems, and outlines different capability levels for such agents.
