Table of Contents
Fetching ...

LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation

Yijun Liu, Wu Liu, Xiaoyan Gu, Yong Rui, Xiaodong He, Yongdong Zhang

TL;DR

LMAgent tackles the challenge of simulating belief-credible multi-user behavior at very large scale with multimodal interactions by leveraging a multimodal LLM-driven agent society. It introduces a two-tier cognition framework (internal persona/memory/planning and external multimodal actions) plus self-consistency prompting, fast memory, and small-world networks to scale to 10,000 agents. The work shows that LMAgent achieves human-like behavioral indicators, emergent herd behaviors, and co-purchase patterns similar to real data, while maintaining efficiency. These results point to powerful applications in social science research and large-scale simulation of complex online ecosystems.

Abstract

The believable simulation of multi-user behavior is crucial for understanding complex social systems. Recently, large language models (LLMs)-based AI agents have made significant progress, enabling them to achieve human-like intelligence across various tasks. However, real human societies are often dynamic and complex, involving numerous individuals engaging in multimodal interactions. In this paper, taking e-commerce scenarios as an example, we present LMAgent, a very large-scale and multimodal agents society based on multimodal LLMs. In LMAgent, besides freely chatting with friends, the agents can autonomously browse, purchase, and review products, even perform live streaming e-commerce. To simulate this complex system, we introduce a self-consistency prompting mechanism to augment agents' multimodal capabilities, resulting in significantly improved decision-making performance over the existing multi-agent system. Moreover, we propose a fast memory mechanism combined with the small-world model to enhance system efficiency, which supports more than 10,000 agent simulations in a society. Experiments on agents' behavior show that these agents achieve comparable performance to humans in behavioral indicators. Furthermore, compared with the existing LLMs-based multi-agent system, more different and valuable phenomena are exhibited, such as herd behavior, which demonstrates the potential of LMAgent in credible large-scale social behavior simulations.

LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation

TL;DR

LMAgent tackles the challenge of simulating belief-credible multi-user behavior at very large scale with multimodal interactions by leveraging a multimodal LLM-driven agent society. It introduces a two-tier cognition framework (internal persona/memory/planning and external multimodal actions) plus self-consistency prompting, fast memory, and small-world networks to scale to 10,000 agents. The work shows that LMAgent achieves human-like behavioral indicators, emergent herd behaviors, and co-purchase patterns similar to real data, while maintaining efficiency. These results point to powerful applications in social science research and large-scale simulation of complex online ecosystems.

Abstract

The believable simulation of multi-user behavior is crucial for understanding complex social systems. Recently, large language models (LLMs)-based AI agents have made significant progress, enabling them to achieve human-like intelligence across various tasks. However, real human societies are often dynamic and complex, involving numerous individuals engaging in multimodal interactions. In this paper, taking e-commerce scenarios as an example, we present LMAgent, a very large-scale and multimodal agents society based on multimodal LLMs. In LMAgent, besides freely chatting with friends, the agents can autonomously browse, purchase, and review products, even perform live streaming e-commerce. To simulate this complex system, we introduce a self-consistency prompting mechanism to augment agents' multimodal capabilities, resulting in significantly improved decision-making performance over the existing multi-agent system. Moreover, we propose a fast memory mechanism combined with the small-world model to enhance system efficiency, which supports more than 10,000 agent simulations in a society. Experiments on agents' behavior show that these agents achieve comparable performance to humans in behavioral indicators. Furthermore, compared with the existing LLMs-based multi-agent system, more different and valuable phenomena are exhibited, such as herd behavior, which demonstrates the potential of LMAgent in credible large-scale social behavior simulations.

Paper Structure

This paper contains 34 sections, 11 equations, 7 figures, 5 tables, 2 algorithms.

Figures (7)

  • Figure 1: (a) The existing Multi-agent System is driven by text-based LLMs, enabling textual interactions among multiple agents. (b) Our LMAgent is driven by multimodal LLMs, involving a society composed of ten thousand-scale agents and their multimodal interactions.
  • Figure 2: The overview of LMAgent. In this sandbox environment, each agent has its own memory and persona, it can set goals and reflect based on their memory. From an external behavior perspective, agents can freely engage in multimodal social and shopping behaviors. Their internal behavior can guide their external behavior, which in turn influences their internal behavior. We use the small-world model to initialize the society's relation network to more closely resemble real-world social networks.
  • Figure 3: Diagram of different network structures.
  • Figure 4: Efficiency impact of fast memory. The shaded areas show the range of total tokens consumed in 5 repeated experiments, where the solid lines indicate the average consumption. The pie chart shows the distribution of token consumption.
  • Figure 5: Comparison of co-purchase patterns and purchasing behaviors: (a) co-purchase correlations derived from JD user data,(b) co-purchase correlations generated by LMAgent simulations, and (c) purchasing proportions for the top-10 products across different agent scales.
  • ...and 2 more figures