Table of Contents
Fetching ...

Towards Agentic Recommender Systems in the Era of Multimodal Large Language Models

Chengkai Huang, Junda Wu, Yu Xia, Zixu Yu, Ruhan Wang, Tong Yu, Ruiyi Zhang, Ryan A. Rossi, Branislav Kveton, Dongruo Zhou, Julian McAuley, Lina Yao

TL;DR

The paper argues that agentic recommender systems powered by multimodal large language models (LLMs) represent the next evolution in personalization, enabling autonomous planning, memory, and tool-enabled reasoning across multimodal inputs. It formalizes LLM-ARS with a four-component architecture (user profiling, planning, memory, action) and a formal task model, then analyzes single-agent and multi-agent frameworks, role-playing for user modeling, and interaction dynamics. Seven research questions guide the assessment of reasoning, user understanding, architectures, benchmarking, safety, autonomy-controllability, and lifelong personalization. The work surveys framework options (single-agent, multi-agent, human-LLM hybrids), highlights open problems in multimodal reasoning, benchmarking, and continual learning, and outlines a path toward scalable, trustworthy, and proactive recommendation systems that better align with evolving user needs.

Abstract

Recent breakthroughs in Large Language Models (LLMs) have led to the emergence of agentic AI systems that extend beyond the capabilities of standalone models. By empowering LLMs to perceive external environments, integrate multimodal information, and interact with various tools, these agentic systems exhibit greater autonomy and adaptability across complex tasks. This evolution brings new opportunities to recommender systems (RS): LLM-based Agentic RS (LLM-ARS) can offer more interactive, context-aware, and proactive recommendations, potentially reshaping the user experience and broadening the application scope of RS. Despite promising early results, fundamental challenges remain, including how to effectively incorporate external knowledge, balance autonomy with controllability, and evaluate performance in dynamic, multimodal settings. In this perspective paper, we first present a systematic analysis of LLM-ARS: (1) clarifying core concepts and architectures; (2) highlighting how agentic capabilities -- such as planning, memory, and multimodal reasoning -- can enhance recommendation quality; and (3) outlining key research questions in areas such as safety, efficiency, and lifelong personalization. We also discuss open problems and future directions, arguing that LLM-ARS will drive the next wave of RS innovation. Ultimately, we foresee a paradigm shift toward intelligent, autonomous, and collaborative recommendation experiences that more closely align with users' evolving needs and complex decision-making processes.

Towards Agentic Recommender Systems in the Era of Multimodal Large Language Models

TL;DR

The paper argues that agentic recommender systems powered by multimodal large language models (LLMs) represent the next evolution in personalization, enabling autonomous planning, memory, and tool-enabled reasoning across multimodal inputs. It formalizes LLM-ARS with a four-component architecture (user profiling, planning, memory, action) and a formal task model, then analyzes single-agent and multi-agent frameworks, role-playing for user modeling, and interaction dynamics. Seven research questions guide the assessment of reasoning, user understanding, architectures, benchmarking, safety, autonomy-controllability, and lifelong personalization. The work surveys framework options (single-agent, multi-agent, human-LLM hybrids), highlights open problems in multimodal reasoning, benchmarking, and continual learning, and outlines a path toward scalable, trustworthy, and proactive recommendation systems that better align with evolving user needs.

Abstract

Recent breakthroughs in Large Language Models (LLMs) have led to the emergence of agentic AI systems that extend beyond the capabilities of standalone models. By empowering LLMs to perceive external environments, integrate multimodal information, and interact with various tools, these agentic systems exhibit greater autonomy and adaptability across complex tasks. This evolution brings new opportunities to recommender systems (RS): LLM-based Agentic RS (LLM-ARS) can offer more interactive, context-aware, and proactive recommendations, potentially reshaping the user experience and broadening the application scope of RS. Despite promising early results, fundamental challenges remain, including how to effectively incorporate external knowledge, balance autonomy with controllability, and evaluate performance in dynamic, multimodal settings. In this perspective paper, we first present a systematic analysis of LLM-ARS: (1) clarifying core concepts and architectures; (2) highlighting how agentic capabilities -- such as planning, memory, and multimodal reasoning -- can enhance recommendation quality; and (3) outlining key research questions in areas such as safety, efficiency, and lifelong personalization. We also discuss open problems and future directions, arguing that LLM-ARS will drive the next wave of RS innovation. Ultimately, we foresee a paradigm shift toward intelligent, autonomous, and collaborative recommendation experiences that more closely align with users' evolving needs and complex decision-making processes.

Paper Structure

This paper contains 26 sections, 6 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The rising trend in the research field of LLM-based Agents. We categorize current work into single-agent and multi-agent categories.
  • Figure 2: Different types of personalized LLM-based agents in LLM-ARS, where (i) LLM-Agent simulates user behavior, (ii) LLM-Agent acts as a recommender, and (iii) LLM-Agent functions as both user simulation and recommender.