Table of Contents
Fetching ...

BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation

Zihao Zhu, Ruotong Wang, Siwei Lyu, Min Zhang, Baoyuan Wu

TL;DR

This work introduces the task of seamless brand integration in T2V: automatically embedding advertiser brands into prompt-generated videos while preserving semantic fidelity to user intent and proposes BrandFusion, a novel multi-agent framework comprising two synergistic phases.

Abstract

The rapid advancement of text-to-video (T2V) models has revolutionized content creation, yet their commercial potential remains largely untapped. We introduce, for the first time, the task of seamless brand integration in T2V: automatically embedding advertiser brands into prompt-generated videos while preserving semantic fidelity to user intent. This task confronts three core challenges: maintaining prompt fidelity, ensuring brand recognizability, and achieving contextually natural integration. To address them, we propose BrandFusion, a novel multi-agent framework comprising two synergistic phases. In the offline phase (advertiser-facing), we construct a Brand Knowledge Base by probing model priors and adapting to novel brands via lightweight fine-tuning. In the online phase (user-facing), five agents jointly refine user prompts through iterative refinement, leveraging the shared knowledge base and real-time contextual tracking to ensure brand visibility and semantic alignment. Experiments on 18 established and 2 custom brands across multiple state-of-the-art T2V models demonstrate that BrandFusion significantly outperforms baselines in semantic preservation, brand recognizability, and integration naturalness. Human evaluations further confirm higher user satisfaction, establishing a practical pathway for sustainable T2V monetization.

BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation

TL;DR

This work introduces the task of seamless brand integration in T2V: automatically embedding advertiser brands into prompt-generated videos while preserving semantic fidelity to user intent and proposes BrandFusion, a novel multi-agent framework comprising two synergistic phases.

Abstract

The rapid advancement of text-to-video (T2V) models has revolutionized content creation, yet their commercial potential remains largely untapped. We introduce, for the first time, the task of seamless brand integration in T2V: automatically embedding advertiser brands into prompt-generated videos while preserving semantic fidelity to user intent. This task confronts three core challenges: maintaining prompt fidelity, ensuring brand recognizability, and achieving contextually natural integration. To address them, we propose BrandFusion, a novel multi-agent framework comprising two synergistic phases. In the offline phase (advertiser-facing), we construct a Brand Knowledge Base by probing model priors and adapting to novel brands via lightweight fine-tuning. In the online phase (user-facing), five agents jointly refine user prompts through iterative refinement, leveraging the shared knowledge base and real-time contextual tracking to ensure brand visibility and semantic alignment. Experiments on 18 established and 2 custom brands across multiple state-of-the-art T2V models demonstrate that BrandFusion significantly outperforms baselines in semantic preservation, brand recognizability, and integration naturalness. Human evaluations further confirm higher user satisfaction, establishing a practical pathway for sustainable T2V monetization.
Paper Structure (83 sections, 12 figures, 8 tables)

This paper contains 83 sections, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Examples of seamless brand integration by BrandFusion. Given user prompts like basketball games and cyberpunk street scenes, our framework naturally incorporates brands (Nike on jersey/banner, Coca-Cola on billboard) into generated videos. BrandFusion can integrate with multiple T2V models and handle diverse brands while preserving user intent and ensuring natural visual coherence.
  • Figure 2: Ecosystem of brand integration in T2V generation.
  • Figure 3: Overview of the BrandFusion framework: (a) Offline phase builds brand knowledge through probing and adaptation. (b) Online phase employs five collaborative agents for semantic-preserving brand integration with continuous learning.
  • Figure 4: Human evaluation results on semantic fidelity, integration naturalness, and overall acceptability.
  • Figure 5: Comparison across different prompt scene categories.
  • ...and 7 more figures