Table of Contents
Fetching ...

MARC: Multimodal and Multi-Task Agentic Retrieval-Augmented Generation for Cold-Start Recommender System

Seung Hwan Cho, Yujin Yang, Danik Baeck, Minjoo Kim, Young-Min Kim, Heejung Lee, Sangjin Park

TL;DR

MARC addresses cold-start cocktail recommendations by integrating a multimodal input stream with a graph-based, agentic RAG framework. It constructs a cocktail knowledge graph in Neo4j, and employs a Task Recognition Router to route queries into four retrieval tasks, followed by a Reflection loop that iteratively expands and quality-checks results. The graph-based retrieval, coupled with reflection, yields higher quality and more explainable recommendations than vector-based baselines, as demonstrated by LLM-as-a-judge and human evaluations. While promising for explainability and accuracy in the cocktail domain, the approach is limited by domain confinement and a fixed top-k configuration, motivating future expansion to additional domains, warm-start settings, and image-augmented retrieval.

Abstract

Recommender systems (RS) are currently being studied to mitigate limitations during cold-start conditions by leveraging modality information or introducing Agent concepts based on the exceptional reasoning capabilities of Large Language Models (LLMs). Meanwhile, food and beverage recommender systems have traditionally used knowledge graph and ontology concepts due to the domain's unique data attributes and relationship characteristics. On this background, we propose MARC, a multimodal and multi-task cocktail recommender system based on Agentic Retrieval-Augmented Generation (RAG) utilizing graph database under cold-start conditions. The proposed system generates high-quality, contextually appropriate answers through two core processes: a task recognition router and a reflection process. The graph database was constructed by processing cocktail data from Kaggle, and its effectiveness was evaluated using 200 manually crafted questions. The evaluation used both LLM-as-a-judge and human evaluation to demonstrate that answers generated via the graph database outperformed those from a simple vector database in terms of quality. The code is available at https://github.com/diddbwls/cocktail_rec_agentrag

MARC: Multimodal and Multi-Task Agentic Retrieval-Augmented Generation for Cold-Start Recommender System

TL;DR

MARC addresses cold-start cocktail recommendations by integrating a multimodal input stream with a graph-based, agentic RAG framework. It constructs a cocktail knowledge graph in Neo4j, and employs a Task Recognition Router to route queries into four retrieval tasks, followed by a Reflection loop that iteratively expands and quality-checks results. The graph-based retrieval, coupled with reflection, yields higher quality and more explainable recommendations than vector-based baselines, as demonstrated by LLM-as-a-judge and human evaluations. While promising for explainability and accuracy in the cocktail domain, the approach is limited by domain confinement and a fixed top-k configuration, motivating future expansion to additional domains, warm-start settings, and image-augmented retrieval.

Abstract

Recommender systems (RS) are currently being studied to mitigate limitations during cold-start conditions by leveraging modality information or introducing Agent concepts based on the exceptional reasoning capabilities of Large Language Models (LLMs). Meanwhile, food and beverage recommender systems have traditionally used knowledge graph and ontology concepts due to the domain's unique data attributes and relationship characteristics. On this background, we propose MARC, a multimodal and multi-task cocktail recommender system based on Agentic Retrieval-Augmented Generation (RAG) utilizing graph database under cold-start conditions. The proposed system generates high-quality, contextually appropriate answers through two core processes: a task recognition router and a reflection process. The graph database was constructed by processing cocktail data from Kaggle, and its effectiveness was evaluated using 200 manually crafted questions. The evaluation used both LLM-as-a-judge and human evaluation to demonstrate that answers generated via the graph database outperformed those from a simple vector database in terms of quality. The code is available at https://github.com/diddbwls/cocktail_rec_agentrag

Paper Structure

This paper contains 18 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overall workflow of the MARC
  • Figure 2: Visualization of Evaluation Results using LLM-as-a-Judge (Left: GPT-4o-mini, Right: GPT-5)