Table of Contents
Fetching ...

Recovering Dynamic 3D Sketches from Videos

Jaeah Lee, Changwoon Choi, Young Min Kim, Jaesik Park

TL;DR

Liv3Stroke tackles the challenge of extracting 3D motion from videos by introducing deformable 3D strokes as a compact, interpretable representation. It first learns a coarse 3D motion guidance from a point-cloud-based deformation network, then fits 3D cubic Bézier strokes by per-stroke rigid transforms and control-point displacements in a coarse-to-fine optimization, using LPIPS and CLIP-based perceptual losses to align with video frames. The approach yields view-consistent dynamic sketches that robustly capture core motion features across moving cameras and environments, outperforming or matching baselines in both quantitative metrics and qualitative assessments. This work broadens the utility of stroke-based representations to dynamic 3D scenes, offering a compact framework for motion analysis, scene flow estimation, and stroke-based control applications.

Abstract

Understanding 3D motion from videos presents inherent challenges due to the diverse types of movement, ranging from rigid and deformable objects to articulated structures. To overcome this, we propose Liv3Stroke, a novel approach for abstracting objects in motion with deformable 3D strokes. The detailed movements of an object may be represented by unstructured motion vectors or a set of motion primitives using a pre-defined articulation from a template model. Just as a free-hand sketch can intuitively visualize scenes or intentions with a sparse set of lines, we utilize a set of parametric 3D curves to capture a set of spatially smooth motion elements for general objects with unknown structures. We first extract noisy, 3D point cloud motion guidance from video frames using semantic features, and our approach deforms a set of curves to abstract essential motion features as a set of explicit 3D representations. Such abstraction enables an understanding of prominent components of motions while maintaining robustness to environmental factors. Our approach allows direct analysis of 3D object movements from video, tackling the uncertainty that typically occurs when translating real-world motion into recorded footage. The project page is accessible via: https://jaeah.me/liv3stroke_web

Recovering Dynamic 3D Sketches from Videos

TL;DR

Liv3Stroke tackles the challenge of extracting 3D motion from videos by introducing deformable 3D strokes as a compact, interpretable representation. It first learns a coarse 3D motion guidance from a point-cloud-based deformation network, then fits 3D cubic Bézier strokes by per-stroke rigid transforms and control-point displacements in a coarse-to-fine optimization, using LPIPS and CLIP-based perceptual losses to align with video frames. The approach yields view-consistent dynamic sketches that robustly capture core motion features across moving cameras and environments, outperforming or matching baselines in both quantitative metrics and qualitative assessments. This work broadens the utility of stroke-based representations to dynamic 3D scenes, offering a compact framework for motion analysis, scene flow estimation, and stroke-based control applications.

Abstract

Understanding 3D motion from videos presents inherent challenges due to the diverse types of movement, ranging from rigid and deformable objects to articulated structures. To overcome this, we propose Liv3Stroke, a novel approach for abstracting objects in motion with deformable 3D strokes. The detailed movements of an object may be represented by unstructured motion vectors or a set of motion primitives using a pre-defined articulation from a template model. Just as a free-hand sketch can intuitively visualize scenes or intentions with a sparse set of lines, we utilize a set of parametric 3D curves to capture a set of spatially smooth motion elements for general objects with unknown structures. We first extract noisy, 3D point cloud motion guidance from video frames using semantic features, and our approach deforms a set of curves to abstract essential motion features as a set of explicit 3D representations. Such abstraction enables an understanding of prominent components of motions while maintaining robustness to environmental factors. Our approach allows direct analysis of 3D object movements from video, tackling the uncertainty that typically occurs when translating real-world motion into recorded footage. The project page is accessible via: https://jaeah.me/liv3stroke_web

Paper Structure

This paper contains 28 sections, 15 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Liv3Stroke is a novel approach that compactly represents object movements using deformable 3D strokes from videos. Our method achieves view-consistent dynamic sketch reconstruction by shifting and deforming the shape of each stroke.
  • Figure 2: Method overview. We first learn 3D motion guidance from video frames, which are defined as a set of point cloud. Based on this, we can initialize approximated stroke position and motions. We represent movement by transforming each individual stroke through rotation $R$ and translation $T$, and adjusting its control points $\{p_i\}$ with displacement $\{\Delta p_i\}$, thereby reconstructing a dynamic 3D sketch $\mathcal{S}_{\mathrm{3D}}$.
  • Figure 3: Framework for outlining 3D motion. To get the motion layout, we use a 3D point cloud and compute movements by repositioning its points. An MLP acts as the function that estimates motion $\{\Delta P_{i}\}$ across the provided video frames.
  • Figure 4: Comparison of loss functions. Compared to pixel-wise losses (L2 and L1 function), LPIPS loss provides stricter structural guidance when computing differences between two images, even with RGB frames.
  • Figure 5: Pipeline for rendering sketches. We extract the motion as sketches by learning the deformation of each stroke. To achieve this, we separate stroke motions as (1) per-stroke transformations, which are composed of rotation $R_{i}$ and translation $T_{i}$ and (2) shifting each control point of the stroke $\{\Delta p_{i}^{j}\}$. We use MLPs, $\mathcal{M}_{\mathrm{R}}$, $\mathcal{M}_{\mathrm{T}}$, and $\mathcal{M}_{\mathrm{L}}$, as the function of deformation at the given time step.
  • ...and 9 more figures