Table of Contents
Fetching ...

MADIAVE: Multi-Agent Debate for Implicit Attribute Value Extraction

Wei-Chieh Huang, Cornelia Caragea

TL;DR

Implicit AVE in multimodal e-commerce remains challenging due to complex data and gaps in vision-text understanding. This paper introduces MADIAVE, a multi-agent debate framework that uses multiple MLLMs to iteratively verify and refine inferences in a zero-shot setting. Through experiments on the ImplicitAVE dataset, one to two rounds of debate significantly improve accuracy, especially for hard attributes, and convergence dynamics are analyzed under various agent configurations. The results suggest that concise, inter-agent debate can outperform single-agent approaches and offer scalable solutions for implicit AVE in multimodal contexts.

Abstract

Implicit Attribute Value Extraction (AVE) is essential for accurately representing products in e-commerce, as it infers lantent attributes from multimodal data. Despite advances in multimodal large language models (MLLMs), implicit AVE remains challenging due to the complexity of multidimensional data and gaps in vision-text understanding. In this work, we introduce \textsc{\modelname}, a multi-agent debate framework that employs multiple MLLM agents to iteratively refine inferences. Through a series of debate rounds, agents verify and update each other's responses, thereby improving inference performance and robustness. Experiments on the ImplicitAVE dataset demonstrate that even a few rounds of debate significantly boost accuracy, especially for attributes with initially low performance. We systematically evaluate various debate configurations, including identical or different MLLM agents, and analyze how debate rounds affect convergence dynamics. Our findings highlight the potential of multi-agent debate strategies to address the limitations of single-agent approaches and offer a scalable solution for implicit AVE in multimodal e-commerce.

MADIAVE: Multi-Agent Debate for Implicit Attribute Value Extraction

TL;DR

Implicit AVE in multimodal e-commerce remains challenging due to complex data and gaps in vision-text understanding. This paper introduces MADIAVE, a multi-agent debate framework that uses multiple MLLMs to iteratively verify and refine inferences in a zero-shot setting. Through experiments on the ImplicitAVE dataset, one to two rounds of debate significantly improve accuracy, especially for hard attributes, and convergence dynamics are analyzed under various agent configurations. The results suggest that concise, inter-agent debate can outperform single-agent approaches and offer scalable solutions for implicit AVE in multimodal contexts.

Abstract

Implicit Attribute Value Extraction (AVE) is essential for accurately representing products in e-commerce, as it infers lantent attributes from multimodal data. Despite advances in multimodal large language models (MLLMs), implicit AVE remains challenging due to the complexity of multidimensional data and gaps in vision-text understanding. In this work, we introduce \textsc{\modelname}, a multi-agent debate framework that employs multiple MLLM agents to iteratively refine inferences. Through a series of debate rounds, agents verify and update each other's responses, thereby improving inference performance and robustness. Experiments on the ImplicitAVE dataset demonstrate that even a few rounds of debate significantly boost accuracy, especially for attributes with initially low performance. We systematically evaluate various debate configurations, including identical or different MLLM agents, and analyze how debate rounds affect convergence dynamics. Our findings highlight the potential of multi-agent debate strategies to address the limitations of single-agent approaches and offer a scalable solution for implicit AVE in multimodal e-commerce.

Paper Structure

This paper contains 31 sections, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Illustration of the differences between explicit and implicit attributes. A single model may sometimes produce incorrect inferences, whereas a multi-agent setting can potentially provide more accurate reasoning and lead to correct inferences.
  • Figure 2: Overview of the MADIAVE framework. The ImplicitAVE is ingested by a data loader that applies scenario steps to initialize agent roles and trigger multi-round communication and debate. After each round, the message transcript is recorded and appended to the context, serving as additional input for the next round.
  • Figure 3: Domain-level F1 scores for the inference process during debates. (a) Scenario 1: GPT-4o debating with GPT-4o; (b) Scenario 2: Llama-3.2 debating with Llama-3.2; (c) Scenario 3: Phi-3.5 debating with Phi-3.5; (d) Scenario 4: Llama-3.2 debating with GPT-4o; (e) Scenario 5: Llama-3.2 debating with Phi-3.5 (f) Scenario 6: Qwen-2.5 debating with Qwen-2.5.
  • Figure 4: Debate statistics for (a) Scenario 1: GPT-4o debating with GPT-4o; (b) Scenario 2: Llama-3.2 debating with Llama-3.2; (c) Scenario 3: Phi-3.5 debating with Phi-3.5; (d) Scenario 4: Llama-3.2 debating with GPT-4o; (e) Scenario 5: Llama-3.2 debating with Phi-3.5; (f) Scenario 6:Qwen-2.5-VL debating with Qwen-2.5-VL
  • Figure A.1: Illustration of correct, incorrect, and no convergence outcomes observed during agent debate.
  • ...and 1 more figures