A Benchmark and Knowledge-Grounded Framework for Advanced Multimodal Personalization Study

Xia Hu; Honglei Zhuang; Brian Potetz; Alireza Fathi; Bo Hu; Babak Samari; Howard Zhou

A Benchmark and Knowledge-Grounded Framework for Advanced Multimodal Personalization Study

Xia Hu, Honglei Zhuang, Brian Potetz, Alireza Fathi, Bo Hu, Babak Samari, Howard Zhou

TL;DR

This work addresses the challenge of evaluating and enabling advanced multimodal personalization by introducing Life-Bench, a fully synthetic benchmark of virtual accounts with multimodal histories, and LifeGraph, a retrieval-enhanced personal knowledge graph framework. Life-Bench probes complex relational, temporal, and aggregative reasoning over personalized histories, while LifeGraph provides a structured, graph-based retrieval mechanism that grounds multimodal data for personalized reasoning. Empirical results show substantial gaps for existing retrieval-based methods on complex tasks and demonstrate LifeGraph’s strong performance, particularly in relational-temporal reasoning, highlighting the value of graph-structured context and explicit data provenance in personalization. Collectively, the benchmark and framework offer a privacy-preserving, scalable pathway to advance real-world personalized AI that reason over evolving, multimodal personal histories.

Abstract

The powerful reasoning of modern Vision Language Models open a new frontier for advanced personalization study. However, progress in this area is critically hampered by the lack of suitable benchmarks. To address this gap, we introduce Life-Bench, a comprehensive, synthetically generated multimodal benchmark built on simulated user digital footprints. Life-Bench features over questions evaluating a wide spectrum of capabilities, from persona understanding to complex reasoning over historical data. These capabilities expand far beyond prior benchmarks, reflecting the critical demands essential for real-world applications. Furthermore, we propose LifeGraph, an end-to-end framework that organizes personal context into a knowledge graph to facilitate structured retrieval and reasoning. Our experiments on Life-Bench reveal that existing methods falter significantly on complex personalized tasks, exposing a large performance headroom, especially in relational, temporal and aggregative reasoning. While LifeGraph closes this gap by leveraging structured knowledge and demonstrates a promising direction, these advanced personalization tasks remain a critical open challenge, motivating new research in this area.

A Benchmark and Knowledge-Grounded Framework for Advanced Multimodal Personalization Study

TL;DR

Abstract

Paper Structure (30 sections, 1 equation, 12 figures, 7 tables, 2 algorithms)

This paper contains 30 sections, 1 equation, 12 figures, 7 tables, 2 algorithms.

Introduction
Related Work
Advanced Personalization Study
Multimodal Personalization Benchmarks
Personal Knowledge Graph
Life-Bench
Benchmark Overview
Categories
Benchmark Construction
LifeGraph for Personalization
Retrieval-Enhanced VLM Personalization
Personal Knowledge Graph Solution
LifeGraph
Structural Properties and Retrieval Efficiency
Experiments
...and 15 more sections

Figures (12)

Figure 1: Representative examples of data and tasks in Life-Bench. A Vaccount's retrievable context includes personal concepts and timestamped multimodal history. This data grounds a diverse set of evaluation tasks, categorized into Relational Concept Identification (orange, ) and Historical Retrieval and Understanding (blue, ), with questions ranging in difficulty.
Figure 1: Comparison of supported reasoning capabilities across advanced multimodal personalization benchmarks. While existing benchmarks focus on foundational concept and preference tasks, Life-Bench significantly expands evaluation by incorporating tasks for the understanding of detail events and scenes, multi-hop relational reasoning, temporal sequential and aggregative reasoning.
Figure 2: Data distribution by tasks and categories in Life-Bench.
Figure 3: Degree frequency statistics of LifeGraph demonstrating it approximately aligns with power-law barabasi2003scalemislov2007socialnetwork of scale-free graph.
Figure 4: Effect of retrieval depth $d$ (left, evaluated at $k=3$) and width $k$ (right, evaluated at $d=2$) on LifeGraph performance. Tasks are denoted by its acronym in the legend.
...and 7 more figures

A Benchmark and Knowledge-Grounded Framework for Advanced Multimodal Personalization Study

TL;DR

Abstract

A Benchmark and Knowledge-Grounded Framework for Advanced Multimodal Personalization Study

Authors

TL;DR

Abstract

Table of Contents

Figures (12)