How to Sample High Quality 3D Fractals for Action Recognition Pre-Training?
Marko Putak, Thomas B. Moeslund, Joakim Bruslund Haurum
TL;DR
This work tackles pre-training for video action recognition using synthetic data generated from 3D fractals via Iterated Function Systems (IFS). It identifies that naive parameter sampling yields degenerate geometries and introduces four generation strategies, culminating in Targeted Smart Filtering (TSF), a two-stage pre-generation filter that preserves geometric diversity while eliminating degeneracy. TSF delivers roughly a 100x speedup in sampling and achieves superior downstream performance on HMDB51 and UCF101 compared to other 3D fractal filtering methods. The study demonstrates that carefully diversified, formula-driven synthetic data can provide meaningful transfer benefits for action recognition, enabling scalable and efficient on-the-fly dataset generation for pre-training.
Abstract
Synthetic datasets are being recognized in the deep learning realm as a valuable alternative to exhaustively labeled real data. One such synthetic data generation method is Formula Driven Supervised Learning (FDSL), which can provide an infinite number of perfectly labeled data through a formula driven approach, such as fractals or contours. FDSL does not have common drawbacks like manual labor, privacy and other ethical concerns. In this work we generate 3D fractals using 3D Iterated Function Systems (IFS) for pre-training an action recognition model. The fractals are temporally transformed to form a video that is used as a pre-training dataset for downstream task of action recognition. We find that standard methods of generating fractals are slow and produce degenerate 3D fractals. Therefore, we systematically explore alternative ways of generating fractals and finds that overly-restrictive approaches, while generating aesthetically pleasing fractals, are detrimental for downstream task performance. We propose a novel method, Targeted Smart Filtering, to address both the generation speed and fractal diversity issue. The method reports roughly 100 times faster sampling speed and achieves superior downstream performance against other 3D fractal filtering methods.
