Table of Contents
Fetching ...

MoVer: Motion Verification for Motion Graphics Animations

Jiaju Ma, Maneesh Agrawala

TL;DR

MoVer introduces a motion verification DSL that uses first-order logic to check spatio-temporal properties of motion graphics and integrates this with an LLM-based synthesis pipeline. By modeling animations with per-frame SVG representations and predicate matrices grounded in $\mathcal{Allen}$ interval algebra and the rectangle algebra, MoVer enables automated verification and iterative correction of prompts to produce animations that match specification. On a synthetic dataset of 5600 prompt–MoVer pairs, the LLM-based pipeline achieves 95.1% ground-truth MoVer program accuracy, with the overall success rate improving from 58.8% without iterations to 93.6% with up to 50 correction iterations, illustrating the practical value of automated verification in guiding synthesis. The work advances the reliability of text-to-motion graphics generation and suggests broader applicability of formal verification loops for visual content creation.

Abstract

While large vision-language models can generate motion graphics animations from text prompts, they regularly fail to include all spatio-temporal properties described in the prompt. We introduce MoVer, a motion verification DSL based on first-order logic that can check spatio-temporal properties of a motion graphics animation. We identify a general set of such properties that people commonly use to describe animations (e.g., the direction and timing of motions, the relative positioning of objects, etc.). We implement these properties as predicates in MoVer and provide an execution engine that can apply a MoVer program to any input SVG-based motion graphics animation. We then demonstrate how MoVer can be used in an LLM-based synthesis and verification pipeline for iteratively refining motion graphics animations. Given a text prompt, our pipeline synthesizes a motion graphics animation and a corresponding MoVer program. Executing the verification program on the animation yields a report of the predicates that failed and the report can be automatically fed back to LLM to iteratively correct the animation. To evaluate our pipeline, we build a synthetic dataset of 5600 text prompts paired with ground truth MoVer verification programs. We find that while our LLM-based pipeline is able to automatically generate a correct motion graphics animation for 58.8% of the test prompts without any iteration, this number raises to 93.6% with up to 50 correction iterations. Our code and dataset are at https://mover-dsl.github.io.

MoVer: Motion Verification for Motion Graphics Animations

TL;DR

MoVer introduces a motion verification DSL that uses first-order logic to check spatio-temporal properties of motion graphics and integrates this with an LLM-based synthesis pipeline. By modeling animations with per-frame SVG representations and predicate matrices grounded in interval algebra and the rectangle algebra, MoVer enables automated verification and iterative correction of prompts to produce animations that match specification. On a synthetic dataset of 5600 prompt–MoVer pairs, the LLM-based pipeline achieves 95.1% ground-truth MoVer program accuracy, with the overall success rate improving from 58.8% without iterations to 93.6% with up to 50 correction iterations, illustrating the practical value of automated verification in guiding synthesis. The work advances the reliability of text-to-motion graphics generation and suggests broader applicability of formal verification loops for visual content creation.

Abstract

While large vision-language models can generate motion graphics animations from text prompts, they regularly fail to include all spatio-temporal properties described in the prompt. We introduce MoVer, a motion verification DSL based on first-order logic that can check spatio-temporal properties of a motion graphics animation. We identify a general set of such properties that people commonly use to describe animations (e.g., the direction and timing of motions, the relative positioning of objects, etc.). We implement these properties as predicates in MoVer and provide an execution engine that can apply a MoVer program to any input SVG-based motion graphics animation. We then demonstrate how MoVer can be used in an LLM-based synthesis and verification pipeline for iteratively refining motion graphics animations. Given a text prompt, our pipeline synthesizes a motion graphics animation and a corresponding MoVer program. Executing the verification program on the animation yields a report of the predicates that failed and the report can be automatically fed back to LLM to iteratively correct the animation. To evaluate our pipeline, we build a synthetic dataset of 5600 text prompts paired with ground truth MoVer verification programs. We find that while our LLM-based pipeline is able to automatically generate a correct motion graphics animation for 58.8% of the test prompts without any iteration, this number raises to 93.6% with up to 50 correction iterations. Our code and dataset are at https://mover-dsl.github.io.

Paper Structure

This paper contains 16 sections, 1 equation, 8 figures, 4 tables.

Figures (8)

  • Figure 1: (top) The 13 low-level temporally relative predicates in Allen's interval algebra and the three aggregated predicates (disjunctions of the low level predicates) in MoVer: before(), while(), and after(). (bottom) The 169 spatially relative predicates in the rectangle algebra and four of the aggregated predicates in MoVer. MoVer implements 10 spatially relative aggregated predicates: top(), bottom(), left(), right(), border(), intersect(), top_border(), bottom_border(), left_border(), and right_border() as sub-regions of this table.
  • Figure 2: The input text prompt (top left) aims to move the Y and the S so that they touch the E on either side. The verification report (bottom left) shows that, although the Y does border the left side of E on frame 80 and left_border() evaluates to true, it continues its rightward motion m$_1$ until frame 120, and so the post() predicate fails. Although m$_2$, the motion of S, successfully borders the right side of E, the before() predicate fails because m$_1$ is false.
  • Figure 3: Our pipeline automatically feeds the verification report (Figure \ref{['fig:verification']}) back to the LLM animation synthesizer to correct the animation. On the first iteration, the motion of Y correctly ends adjacent to the left side of E at frame 120. But the motion incorrectly overlaps in time with the motion of S. On the second correction iteration, the sequencing of the motions is correctly updated so that Y moves first (frames 1-60) before S (frames 61-120).
  • Figure 4: The input text prompt describes an animation in which the letter H and the letter A are asked to maintain contact as they scale up and down, as shown in the frames under Correction Iteration 9 (please refer to https://mover-dsl.github.io for the animations in action). However, the initial animation produced by the LLM animation synthesizer fails to satisfy any of the border() relations as indicated by the verification report. In the first correction iteration, the letters H and A are stretched to border the bottom edges of the letters below them, instead of top edges (frame 60 and 120). From there, the synthesizer becomes stuck in a loop, producing the same animation until the eighth correction iteration, where it has corrected the scaling of the yellow H and blue A. In the ninth iteration, it eventually changes the sign of the scale factor of the two bottom letters, passing all verification checks.
  • Figure 5: The input text prompt asks the two letter W's to be rotated around the small pink circle and the letter O to be lowered so that the transformed letters form the word "mom" (animations can be found at https://mover-dsl.github.io). In the initial animation, the W's are rotated correctly, but the letter O does not move at all, rendering both dir() and post() to false. Note that bottom_border() is true because the bottom edge of O aligns with those of the W's at the beginning of the animation (frame 1). In the first correction iteration, the O moves towards the bottom but does not do so enough to align with the W's. In the second iteration, the LLM synthesizer correctly computes the translation distance for O, producing an animation that says "wow mom" at the end (frame 180).
  • ...and 3 more figures