MADIAVE: Multi-Agent Debate for Implicit Attribute Value Extraction
Wei-Chieh Huang, Cornelia Caragea
TL;DR
Implicit AVE in multimodal e-commerce remains challenging due to complex data and gaps in vision-text understanding. This paper introduces MADIAVE, a multi-agent debate framework that uses multiple MLLMs to iteratively verify and refine inferences in a zero-shot setting. Through experiments on the ImplicitAVE dataset, one to two rounds of debate significantly improve accuracy, especially for hard attributes, and convergence dynamics are analyzed under various agent configurations. The results suggest that concise, inter-agent debate can outperform single-agent approaches and offer scalable solutions for implicit AVE in multimodal contexts.
Abstract
Implicit Attribute Value Extraction (AVE) is essential for accurately representing products in e-commerce, as it infers lantent attributes from multimodal data. Despite advances in multimodal large language models (MLLMs), implicit AVE remains challenging due to the complexity of multidimensional data and gaps in vision-text understanding. In this work, we introduce \textsc{\modelname}, a multi-agent debate framework that employs multiple MLLM agents to iteratively refine inferences. Through a series of debate rounds, agents verify and update each other's responses, thereby improving inference performance and robustness. Experiments on the ImplicitAVE dataset demonstrate that even a few rounds of debate significantly boost accuracy, especially for attributes with initially low performance. We systematically evaluate various debate configurations, including identical or different MLLM agents, and analyze how debate rounds affect convergence dynamics. Our findings highlight the potential of multi-agent debate strategies to address the limitations of single-agent approaches and offer a scalable solution for implicit AVE in multimodal e-commerce.
