Table of Contents
Fetching ...

MEEL: Multi-Modal Event Evolution Learning

Zhengwei Tao, Zhi Jin, Junqiang Huang, Xiancai Chen, Xiaoying Bai, Haiyan Zhao, Yifan Zhang, Chongyang Tao

TL;DR

This paper introduces Multi-Modal Event Evolution Learning (MEEL) to enable the model to grasp the event evolution mechanism, yielding advanced MMER ability and proposes the guiding discrimination strategy, in which models are trained to discriminate the improper evolution direction.

Abstract

Multi-modal Event Reasoning (MMER) endeavors to endow machines with the ability to comprehend intricate event relations across diverse data modalities. MMER is fundamental and underlies a wide broad of applications. Despite extensive instruction fine-tuning, current multi-modal large language models still fall short in such ability. The disparity stems from that existing models are insufficient to capture underlying principles governing event evolution in various scenarios. In this paper, we introduce Multi-Modal Event Evolution Learning (MEEL) to enable the model to grasp the event evolution mechanism, yielding advanced MMER ability. Specifically, we commence with the design of event diversification to gather seed events from a rich spectrum of scenarios. Subsequently, we employ ChatGPT to generate evolving graphs for these seed events. We propose an instruction encapsulation process that formulates the evolving graphs into instruction-tuning data, aligning the comprehension of event reasoning to humans. Finally, we observe that models trained in this way are still struggling to fully comprehend event evolution. In such a case, we propose the guiding discrimination strategy, in which models are trained to discriminate the improper evolution direction. We collect and curate a benchmark M-EV2 for MMER. Extensive experiments on M-EV2 validate the effectiveness of our approach, showcasing competitive performance in open-source multi-modal LLMs.

MEEL: Multi-Modal Event Evolution Learning

TL;DR

This paper introduces Multi-Modal Event Evolution Learning (MEEL) to enable the model to grasp the event evolution mechanism, yielding advanced MMER ability and proposes the guiding discrimination strategy, in which models are trained to discriminate the improper evolution direction.

Abstract

Multi-modal Event Reasoning (MMER) endeavors to endow machines with the ability to comprehend intricate event relations across diverse data modalities. MMER is fundamental and underlies a wide broad of applications. Despite extensive instruction fine-tuning, current multi-modal large language models still fall short in such ability. The disparity stems from that existing models are insufficient to capture underlying principles governing event evolution in various scenarios. In this paper, we introduce Multi-Modal Event Evolution Learning (MEEL) to enable the model to grasp the event evolution mechanism, yielding advanced MMER ability. Specifically, we commence with the design of event diversification to gather seed events from a rich spectrum of scenarios. Subsequently, we employ ChatGPT to generate evolving graphs for these seed events. We propose an instruction encapsulation process that formulates the evolving graphs into instruction-tuning data, aligning the comprehension of event reasoning to humans. Finally, we observe that models trained in this way are still struggling to fully comprehend event evolution. In such a case, we propose the guiding discrimination strategy, in which models are trained to discriminate the improper evolution direction. We collect and curate a benchmark M-EV2 for MMER. Extensive experiments on M-EV2 validate the effectiveness of our approach, showcasing competitive performance in open-source multi-modal LLMs.
Paper Structure (19 sections, 3 equations, 7 figures, 6 tables, 2 algorithms)

This paper contains 19 sections, 3 equations, 7 figures, 6 tables, 2 algorithms.

Figures (7)

  • Figure 1: Part of the event evolution of a hurricane scenario. The queried event is in red. $\textsc{MEEL}$ endows the model with the knowledge of all events in the scenario evolution. Current methods only train the model of few clips of event reasoning of the green one.
  • Figure 2: Overview of $\textsc{MEEL}$. We first implement the Event Diversification to harvest seed events. Then we perform the Event Graph Evolution to obtain the evolving graphs. We adapt the evolving graphs into instruction-tuning data through our Instruction Encapsulation. The Guiding Discrimination aims to improve the evolution learning with our two negative event mining strategies.
  • Figure 3: (a) Evolving prompt. The sentence in brown only exists if ${\mathcal{E}}$ is the seed event. In such a case, we add the caption of ${\mathcal{I}}$. (b) Instruction templates generation of Result relation and one example of generated template. (c) Multiple-choice Instruction templates generation of Result relation and one example of generated template. {caption} is the placeholder for the image caption. {event} and {examples} are for the event ${\mathcal{E}}$ and in-context examples.
  • Figure 4: Analysis of steps of event graph evolution.
  • Figure 5: An example of an event-evolving graph. The event pointed to by the head cut is a tail event generated that satisfies the color relationship of the head cut.
  • ...and 2 more figures