EPD: Long-term Memory Extraction, Context-awared Planning and Multi-iteration Decision @ EgoPlan Challenge ICML 2024
Letian Shi, Qi Lv, Xiang Deng, Liqiang Nie
TL;DR
This paper tackles egocentric task planning by introducing EPD, a training-free framework that unifies long-term memory extraction, context-aware planning, and multi-iteration decision to predict next actions from long videos and current observations. Memory extraction uses GPT-4o to summarize action segments, while planning leverages multimodal prompts from memory and current observations with GPT-4o and Claude 3.5. A multi-iteration decision step combines multiple planning outputs to select the most plausible action, achieving 53.85% accuracy on EgoPlan-Test and outperforming both training-free and trained baselines. The work highlights the practical potential of prompt-based, multimodal LLM systems for real-world egocentric planning, while also noting limitations in fine-grained visual understanding that may benefit from future trainable refinements.
Abstract
In this technical report, we present our solution for the EgoPlan Challenge in ICML 2024. To address the real-world egocentric task planning problem, we introduce a novel planning framework which comprises three stages: long-term memory Extraction, context-awared Planning, and multi-iteration Decision, named EPD. Given the task goal, task progress, and current observation, the extraction model first extracts task-relevant memory information from the progress video, transforming the complex long video into summarized memory information. The planning model then combines the context of the memory information with fine-grained visual information from the current observation to predict the next action. Finally, through multi-iteration decision-making, the decision model comprehensively understands the task situation and current state to make the most realistic planning decision. On the EgoPlan-Test set, EPD achieves a planning accuracy of 53.85% over 1,584 egocentric task planning questions. We have made all codes available at https://github.com/Kkskkkskr/EPD .
