Table of Contents
Fetching ...

OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens

Yiying Yang, Wei Cheng, Sijin Chen, Honghao Fu, Xianfang Zeng, Yujun Cai, Gang Yu, Xingjun Ma

TL;DR

This work introduces a well designed Lottie tokenizer that transforms JSON files into structured sequences of commands and parameters representing shapes, animation functions and control parameters, and validate that OmniLottie can produce vivid and semantically aligned vector animations that adhere closely to multi modal human instructions.

Abstract

OmniLottie is a versatile framework that generates high quality vector animations from multi-modal instructions. For flexible motion and visual content control, we focus on Lottie, a light weight JSON formatting for both shapes and animation behaviors representation. However, the raw Lottie JSON files contain extensive invariant structural metadata and formatting tokens, posing significant challenges for learning vector animation generation. Therefore, we introduce a well designed Lottie tokenizer that transforms JSON files into structured sequences of commands and parameters representing shapes, animation functions and control parameters. Such tokenizer enables us to build OmniLottie upon pretrained vision language models to follow multi-modal interleaved instructions and generate high quality vector animations. To further advance research in vector animation generation, we curate MMLottie-2M, a large scale dataset of professionally designed vector animations paired with textual and visual annotations. With extensive experiments, we validate that OmniLottie can produce vivid and semantically aligned vector animations that adhere closely to multi modal human instructions.

OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens

TL;DR

This work introduces a well designed Lottie tokenizer that transforms JSON files into structured sequences of commands and parameters representing shapes, animation functions and control parameters, and validate that OmniLottie can produce vivid and semantically aligned vector animations that adhere closely to multi modal human instructions.

Abstract

OmniLottie is a versatile framework that generates high quality vector animations from multi-modal instructions. For flexible motion and visual content control, we focus on Lottie, a light weight JSON formatting for both shapes and animation behaviors representation. However, the raw Lottie JSON files contain extensive invariant structural metadata and formatting tokens, posing significant challenges for learning vector animation generation. Therefore, we introduce a well designed Lottie tokenizer that transforms JSON files into structured sequences of commands and parameters representing shapes, animation functions and control parameters. Such tokenizer enables us to build OmniLottie upon pretrained vision language models to follow multi-modal interleaved instructions and generate high quality vector animations. To further advance research in vector animation generation, we curate MMLottie-2M, a large scale dataset of professionally designed vector animations paired with textual and visual annotations. With extensive experiments, we validate that OmniLottie can produce vivid and semantically aligned vector animations that adhere closely to multi modal human instructions.
Paper Structure (40 sections, 10 equations, 13 figures, 10 tables, 1 algorithm)

This paper contains 40 sections, 10 equations, 13 figures, 10 tables, 1 algorithm.

Figures (13)

  • Figure 1: OmniLottie is a Versatile Auto-regressive Generative Model for High Quality Lottie Animations. With user inputs of interleaved multi-modal instructions, OmniLottie supports tasks including text-to-Lottie, text-image-to-Lottie and video-to-Lottie generation. This broad capability makes it a powerful and flexible solution for a wide range of creative and design-oriented tasks.
  • Figure 2: Overview of the Vector Animation Data Construction Pipeline. : We convert SVG assets into static Lottie files and apply randomized animation effects to generate effect-tagged animated Lotties. In parallel, we gather professionally created Lottie animations from five online platforms and perform thorough filtering and cleaning. : Each animation undergoes spatio-temporal normalization, followed by video rendering and random keyframe extraction. : Finally, we provide multi-granularity annotations emphasizing geometric structure, color attributes, and motion characteristics.
  • Figure 3: Overview of OmniLottie. : We reorganize the Lottie JSON representation, with a particular focus on the structure of its layers, including both common layer attributes and five special layer types. : The hierarchical JSON format of Lottie is flattened into a sequence of function calls, which are further parameterized to define a dedicated vocabulary and token set for Lottie. : Built upon this parameterization, OmniLottie extends Qwen2.5-VL with a new tokenizer and vocabulary for Lottie, and is trained on our curated Lottie dataset.
  • Figure 4: Qualitative comparison of Text-to-Lottie generation. Methods that failed to generate valid animations are omitted to ensure a clear comparison.
  • Figure 5: Qualitative comparison of Text-Image-to-Lottie generation. Methods that failed to generate valid animations are omitted to ensure a clear comparison.
  • ...and 8 more figures