Table of Contents
Fetching ...

Position: Agentic Systems Constitute a Key Component of Next-Generation Intelligent Image Processing

Jinjin Gu

TL;DR

This paper identifies fundamental limitations of purely model-centric image processing in generality, adaptability, and real-world problem solving, and argues for a paradigm shift toward agentic image processing systems that orchestrate multiple tools. It introduces a framework where agents leverage LLMs and modular components to plan, execute, reflect, and adapt processing workflows, with a six-level taxonomy of agentic capability. The authors outline core design principles, cognitive architectures, knowledge infusion strategies, and human–computer interaction requirements, providing guidance for future research. The work emphasizes that agentic, tool-coordinating systems can achieve greater autonomy, generality, and creativity, with practical implications for more robust and versatile image processing solutions.

Abstract

This position paper argues that the image processing community should broaden its focus from purely model-centric development to include agentic system design as an essential complementary paradigm. While deep learning has significantly advanced capabilities for specific image processing tasks, current approaches face critical limitations in generalization, adaptability, and real-world problem-solving flexibility. We propose that developing intelligent agentic systems, capable of dynamically selecting, combining, and optimizing existing image processing tools, represents the next evolutionary step for the field. Such systems would emulate human experts' ability to strategically orchestrate different tools to solve complex problems, overcoming the brittleness of monolithic models. The paper analyzes key limitations of model-centric paradigms, establishes design principles for agentic image processing systems, and outlines different capability levels for such agents.

Position: Agentic Systems Constitute a Key Component of Next-Generation Intelligent Image Processing

TL;DR

This paper identifies fundamental limitations of purely model-centric image processing in generality, adaptability, and real-world problem solving, and argues for a paradigm shift toward agentic image processing systems that orchestrate multiple tools. It introduces a framework where agents leverage LLMs and modular components to plan, execute, reflect, and adapt processing workflows, with a six-level taxonomy of agentic capability. The authors outline core design principles, cognitive architectures, knowledge infusion strategies, and human–computer interaction requirements, providing guidance for future research. The work emphasizes that agentic, tool-coordinating systems can achieve greater autonomy, generality, and creativity, with practical implications for more robust and versatile image processing solutions.

Abstract

This position paper argues that the image processing community should broaden its focus from purely model-centric development to include agentic system design as an essential complementary paradigm. While deep learning has significantly advanced capabilities for specific image processing tasks, current approaches face critical limitations in generalization, adaptability, and real-world problem-solving flexibility. We propose that developing intelligent agentic systems, capable of dynamically selecting, combining, and optimizing existing image processing tools, represents the next evolutionary step for the field. Such systems would emulate human experts' ability to strategically orchestrate different tools to solve complex problems, overcoming the brittleness of monolithic models. The paper analyzes key limitations of model-centric paradigms, establishes design principles for agentic image processing systems, and outlines different capability levels for such agents.

Paper Structure

This paper contains 18 sections, 4 figures.

Figures (4)

  • Figure 1: The existing research paradigm focuses on developing more powerful and multi-functional image processing models. In contrast, we advocate a new research paradigm centered on building agentic systems. Our goal is to create an agent that can integrate and leverage these models to achieve higher levels of intelligence, automation, and generality.
  • Figure 2: How image processing systems can embody different levels of agentic to enhance their generality and intelligence.
  • Figure 3: Levels of agentic capability in image processing systems, illustrating the progression from fixed rule-based methods (Level 0) to fully autonomous and creative systems (Level 5). Each level builds on the previous one, adding layers of adaptability, reflection, self-evolution, and creativity.
  • Figure 4: Cognitive architecture for image processing systems, illustrating the iterative process of perception, scheduling, execution, reflection, and rescheduling to achieve satisfactory results.