Table of Contents
Fetching ...

Efficiently Enhancing General Agents With Hierarchical-categorical Memory

Changze Qiao, Mingming Lu

TL;DR

The paper tackles the problem of building a general multimodal agent capable of continual learning without updating model parameters, addressing limitations of end-to-end training and static tool-use approaches. It introduces EHC, a framework consisting of a Hierarchical Memory Retrieval (HMR) module and a Task-Category Oriented Experience Learning (TOEL) module, enabling rapid memory access, reduced storage overhead, and category-aware pattern extraction. HMR employs a dual-pool memory (fast RAM and external DB) with a dynamic migration policy, while TOEL collects experiences, classifies them into predefined task categories, and distills actionable insights via few-shot reasoning, improving adaptability across task types. Inference leverages same-category insights as few-shot context, and experiments on GQA, NLVR2, and grounding/editing benchmarks demonstrate state-of-the-art-like performance, robustness, and improved interpretability, highlighting EHC’s potential as a scalable, general multimodal agent with continual learning capabilities.

Abstract

With large language models (LLMs) demonstrating remarkable capabilities, there has been a surge in research on leveraging LLMs to build general-purpose multi-modal agents. However, existing approaches either rely on computationally expensive end-to-end training using large-scale multi-modal data or adopt tool-use methods that lack the ability to continuously learn and adapt to new environments. In this paper, we introduce EHC, a general agent capable of learning without parameter updates. EHC consists of a Hierarchical Memory Retrieval (HMR) module and a Task-Category Oriented Experience Learning (TOEL) module. The HMR module facilitates rapid retrieval of relevant memories and continuously stores new information without being constrained by memory capacity. The TOEL module enhances the agent's comprehension of various task characteristics by classifying experiences and extracting patterns across different categories. Extensive experiments conducted on multiple standard datasets demonstrate that EHC outperforms existing methods, achieving state-of-the-art performance and underscoring its effectiveness as a general agent for handling complex multi-modal tasks.

Efficiently Enhancing General Agents With Hierarchical-categorical Memory

TL;DR

The paper tackles the problem of building a general multimodal agent capable of continual learning without updating model parameters, addressing limitations of end-to-end training and static tool-use approaches. It introduces EHC, a framework consisting of a Hierarchical Memory Retrieval (HMR) module and a Task-Category Oriented Experience Learning (TOEL) module, enabling rapid memory access, reduced storage overhead, and category-aware pattern extraction. HMR employs a dual-pool memory (fast RAM and external DB) with a dynamic migration policy, while TOEL collects experiences, classifies them into predefined task categories, and distills actionable insights via few-shot reasoning, improving adaptability across task types. Inference leverages same-category insights as few-shot context, and experiments on GQA, NLVR2, and grounding/editing benchmarks demonstrate state-of-the-art-like performance, robustness, and improved interpretability, highlighting EHC’s potential as a scalable, general multimodal agent with continual learning capabilities.

Abstract

With large language models (LLMs) demonstrating remarkable capabilities, there has been a surge in research on leveraging LLMs to build general-purpose multi-modal agents. However, existing approaches either rely on computationally expensive end-to-end training using large-scale multi-modal data or adopt tool-use methods that lack the ability to continuously learn and adapt to new environments. In this paper, we introduce EHC, a general agent capable of learning without parameter updates. EHC consists of a Hierarchical Memory Retrieval (HMR) module and a Task-Category Oriented Experience Learning (TOEL) module. The HMR module facilitates rapid retrieval of relevant memories and continuously stores new information without being constrained by memory capacity. The TOEL module enhances the agent's comprehension of various task characteristics by classifying experiences and extracting patterns across different categories. Extensive experiments conducted on multiple standard datasets demonstrate that EHC outperforms existing methods, achieving state-of-the-art performance and underscoring its effectiveness as a general agent for handling complex multi-modal tasks.

Paper Structure

This paper contains 11 sections, 1 equation, 2 figures, 3 tables.

Figures (2)

  • Figure 1: The framework of EHC.
  • Figure 2: Case study of EHC on two typical example tasks.