How to Sample High Quality 3D Fractals for Action Recognition Pre-Training?

Marko Putak; Thomas B. Moeslund; Joakim Bruslund Haurum

How to Sample High Quality 3D Fractals for Action Recognition Pre-Training?

Marko Putak, Thomas B. Moeslund, Joakim Bruslund Haurum

TL;DR

This work tackles pre-training for video action recognition using synthetic data generated from 3D fractals via Iterated Function Systems (IFS). It identifies that naive parameter sampling yields degenerate geometries and introduces four generation strategies, culminating in Targeted Smart Filtering (TSF), a two-stage pre-generation filter that preserves geometric diversity while eliminating degeneracy. TSF delivers roughly a 100x speedup in sampling and achieves superior downstream performance on HMDB51 and UCF101 compared to other 3D fractal filtering methods. The study demonstrates that carefully diversified, formula-driven synthetic data can provide meaningful transfer benefits for action recognition, enabling scalable and efficient on-the-fly dataset generation for pre-training.

Abstract

Synthetic datasets are being recognized in the deep learning realm as a valuable alternative to exhaustively labeled real data. One such synthetic data generation method is Formula Driven Supervised Learning (FDSL), which can provide an infinite number of perfectly labeled data through a formula driven approach, such as fractals or contours. FDSL does not have common drawbacks like manual labor, privacy and other ethical concerns. In this work we generate 3D fractals using 3D Iterated Function Systems (IFS) for pre-training an action recognition model. The fractals are temporally transformed to form a video that is used as a pre-training dataset for downstream task of action recognition. We find that standard methods of generating fractals are slow and produce degenerate 3D fractals. Therefore, we systematically explore alternative ways of generating fractals and finds that overly-restrictive approaches, while generating aesthetically pleasing fractals, are detrimental for downstream task performance. We propose a novel method, Targeted Smart Filtering, to address both the generation speed and fractal diversity issue. The method reports roughly 100 times faster sampling speed and achieves superior downstream performance against other 3D fractal filtering methods.

How to Sample High Quality 3D Fractals for Action Recognition Pre-Training?

TL;DR

Abstract

Paper Structure (24 sections, 2 equations, 6 figures, 5 tables)

This paper contains 24 sections, 2 equations, 6 figures, 5 tables.

INTRODUCTION
RELATED WORK
Pre-Training for Video Action Recognition
Synthetic Data and Formula-Driven Supervised Learning (FDSL)
The Evolution of Fractal Generation for Pre-Training
Our Contribution in Context
A DATA-DRIVEN APPROACH TO 3D FRACTAL QUALITY
Manual Annotation
Feature Extraction for Geometric Analysis
Analysis of "Good" vs. "Bad" Geometries
METHODOLOGY
3D Fractal Video Pipeline
Fractal Generation Strategies
Baseline: Naive Sampling + Variance Filter
Method 1: SVD-Controlled Filter
...and 9 more sections

Figures (6)

Figure 1: The Complete 3D Fractal Video Pre-Training Pipeline. The process begins with finding valid IFS parameters via one of four main methods. These parameters generate a 3D fractal point cloud using the Chaos Game, which is dynamically transformed into a video dataset. The final dataset is used for model pre-training, and the resulting weights are fine-tuned and compared against baselines on downstream action recognition tasks.
Figure 2: Examples of manually annotated fractals in a 3D projection. In Figure \ref{['fig:good_fractals']} we show complex and geometrically rich "Good" samples that exhibit self-similarity, while in Figure \ref{['fig:bad_fractals']} we show collapsed or sparse "Bad" samples lacking structural detail and complexity.
Figure 3: Pair plot of the top five features ranked by importance, showing only the lower triangle of the matrix. The diagonal panels present Kernel Density Estimates (KDEs) for each feature, highlighting the distributions of the "Good" (green) and "Bad" (red) classes.
Figure 4: Kernel density estimate of the highest scoring feature by importance: Sum-of-$|\det(\mathbf{A})|$. The majority "Good" fractals are between $0$ and $1$ with the largest mode close to $1$. The "Bad" class peaks in a similar range, but it has a heavy tail toward higher values, which can be effectively filtered using a threshold.
Figure 5: Three unrolled fractal transformation videos. Every other frame is rendered side-by-side to visualize how transformation affect fractal appearance over time.
...and 1 more figures

How to Sample High Quality 3D Fractals for Action Recognition Pre-Training?

TL;DR

Abstract

How to Sample High Quality 3D Fractals for Action Recognition Pre-Training?

Authors

TL;DR

Abstract

Table of Contents

Figures (6)