Table of Contents
Fetching ...

Online-PVLM: Advancing Personalized VLMs with Online Concept Learning

Huiyu Bai, Runze Wang, Zhuoyun Du, Yiyang Zhao, Fengji Zhang, Haoyu Chen, Xiaoyong Zhu, Bo Zheng, Xuejiao Zhao

TL;DR

Personalized visual-language models often rely on training per concept, hindering real-time adaptation and scalability. The authors propose Online-PVLM, a training-free online concept learning framework that uses an Omni Concept Embedder, a hyperbolic discriminator, and a LoRA-based VLM, complemented by the OP-Eval benchmark for large-scale evaluation. They demonstrate state-of-the-art results on novel and cached concepts, validate design choices via extensive ablations, and show practical benefits for scalable, memory-efficient personalization in multimodal systems. The work advances online personalization by enabling rapid on-the-fly concept grounding and retrieval, with broad implications for real-world personalized AI assistants and content generation.

Abstract

Personalized Visual Language Models (VLMs) are gaining increasing attention for their formidable ability in user-specific concepts aligned interactions (e.g., identifying a user's bike). Existing methods typically require the learning of separate embeddings for each new concept, which fails to support real-time adaptation during testing. This limitation becomes particularly pronounced in large-scale scenarios, where efficient retrieval of concept embeddings is not achievable. To alleviate this gap, we propose Online-PVLM, a framework for online concept learning by leveraging hyperbolic representations. Our approach makes a train-free paradigm for concept embeddings generation at test time, making the use of personalized VLMs both scalable and efficient. In addition, we develop OP-Eval, a comprehensive and large-scale benchmark comprising 1,292 concepts and over 30K high-quality instances with diverse question types, designed to rigorously assess online concept learning in realistic scenarios. Extensive experiments demonstrate the state-of-the-art performance of our proposed framework. Our source code and dataset will be made available.

Online-PVLM: Advancing Personalized VLMs with Online Concept Learning

TL;DR

Personalized visual-language models often rely on training per concept, hindering real-time adaptation and scalability. The authors propose Online-PVLM, a training-free online concept learning framework that uses an Omni Concept Embedder, a hyperbolic discriminator, and a LoRA-based VLM, complemented by the OP-Eval benchmark for large-scale evaluation. They demonstrate state-of-the-art results on novel and cached concepts, validate design choices via extensive ablations, and show practical benefits for scalable, memory-efficient personalization in multimodal systems. The work advances online personalization by enabling rapid on-the-fly concept grounding and retrieval, with broad implications for real-world personalized AI assistants and content generation.

Abstract

Personalized Visual Language Models (VLMs) are gaining increasing attention for their formidable ability in user-specific concepts aligned interactions (e.g., identifying a user's bike). Existing methods typically require the learning of separate embeddings for each new concept, which fails to support real-time adaptation during testing. This limitation becomes particularly pronounced in large-scale scenarios, where efficient retrieval of concept embeddings is not achievable. To alleviate this gap, we propose Online-PVLM, a framework for online concept learning by leveraging hyperbolic representations. Our approach makes a train-free paradigm for concept embeddings generation at test time, making the use of personalized VLMs both scalable and efficient. In addition, we develop OP-Eval, a comprehensive and large-scale benchmark comprising 1,292 concepts and over 30K high-quality instances with diverse question types, designed to rigorously assess online concept learning in realistic scenarios. Extensive experiments demonstrate the state-of-the-art performance of our proposed framework. Our source code and dataset will be made available.

Paper Structure

This paper contains 37 sections, 4 equations, 13 figures, 13 tables.

Figures (13)

  • Figure 1: An illustrative comparison of our work with existing concept learning methods, e.g., nguyen2024yo and alaluf2024myvlm. Our proposed Online-PVLM is capable of generating personalized concept embedding for new concepts in an online learning manner without further training at test time.
  • Figure 2: The whole pipeline of Online-PVLM for the training stage and inference stage.
  • Figure 3: Illustration of question types and use cases for personalized VLMs. (a) Five types of concept-related tasks with varying difficulty levels. (b) Cached single-concept use case showcasing model performance when the user asks follow-up questions based on previously introduced concepts. (c) Online multi-concept use case demonstrating the model’s ability to learn multiple user-provided concept entities on the fly and answer related queries. Both (b) and (c) highlight the effectiveness of Online-PVLM across different application settings.
  • Figure 4: Ablation study on the hyperbolic curvature value.
  • Figure 5: Ablation study on the token number of personalized concept embedding.
  • ...and 8 more figures