Lynx: Towards High-Fidelity Personalized Video Generation

Shen Sang; Tiancheng Zhi; Tianpei Gu; Jing Liu; Linjie Luo

Lynx: Towards High-Fidelity Personalized Video Generation

Shen Sang, Tiancheng Zhi, Tianpei Gu, Jing Liu, Linjie Luo

TL;DR

Lynx tackles high-fidelity personalized video generation from a single image by extending a Diffusion Transformer with two lightweight adapters: an ID-adapter that injects ArcFace-derived identity tokens via a Perceiver Resampler, and a Ref-adapter that fuses dense VAE features through a frozen reference pathway. The approach uses spatio-temporal frame packing and progressive training to handle variable video lengths and resolutions, achieving robust identity preservation while maintaining temporal coherence. Evaluations on 40 subjects and 800 test cases show state-of-the-art identity fidelity with competitive prompt following and high perceptual video quality, validated by multiple face recognizers and Gemini-based metrics. Overall, Lynx demonstrates a scalable, non-finetuning path to personalized video generation with strong identity, controllability, and realism, paving the way for multi-modal and multi-subject personalization.

Abstract

We present Lynx, a high-fidelity model for personalized video synthesis from a single input image. Built on an open-source Diffusion Transformer (DiT) foundation model, Lynx introduces two lightweight adapters to ensure identity fidelity. The ID-adapter employs a Perceiver Resampler to convert ArcFace-derived facial embeddings into compact identity tokens for conditioning, while the Ref-adapter integrates dense VAE features from a frozen reference pathway, injecting fine-grained details across all transformer layers through cross-attention. These modules collectively enable robust identity preservation while maintaining temporal coherence and visual realism. Through evaluation on a curated benchmark of 40 subjects and 20 unbiased prompts, which yielded 800 test cases, Lynx has demonstrated superior face resemblance, competitive prompt following, and strong video quality, thereby advancing the state of personalized video generation.

Lynx: Towards High-Fidelity Personalized Video Generation

TL;DR

Abstract

Lynx: Towards High-Fidelity Personalized Video Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)