FractalPINN-Flow: A Fractal-Inspired Network for Unsupervised Optical Flow Estimation with Total Variation Regularization
Sara Behnamian, Rasoul Khaksarinezhad, Andreas Langer
TL;DR
FractalPINN-Flow tackles unsupervised dense optical flow estimation from two grayscale frames by integrating a fractal-inspired multiscale deformation network with total variation regularization. The method optimizes a variational energy that combines $L^1$ and $L^2$ data terms with an anisotropic $TV$ penalty, while $w$ is produced end-to-end by a Fractal Deformation Network and a lightweight flow head. Key contributions include the Fractal Deformation Network architecture, a projection layer that preserves spatial detail, and a training regime that achieves edge-preserving, coherent flow without ground-truth supervision. Empirical results on synthetic and Middlebury benchmarks demonstrate robust performance for high-resolution data, with TV regularization offering a tunable balance between detail preservation and smoothness that adapts to scene complexity.
Abstract
We present FractalPINN-Flow, an unsupervised deep learning framework for dense optical flow estimation that learns directly from consecutive grayscale frames without requiring ground truth. The architecture centers on the Fractal Deformation Network (FDN) - a recursive encoder-decoder inspired by fractal geometry and self-similarity. Unlike traditional CNNs with sequential downsampling, FDN uses repeated encoder-decoder nesting with skip connections to capture both fine-grained details and long-range motion patterns. The training objective is based on a classical variational formulation using total variation (TV) regularization. Specifically, we minimize an energy functional that combines $L^1$ and $L^2$ data fidelity terms to enforce brightness constancy, along with a TV term that promotes spatial smoothness and coherent flow fields. Experiments on synthetic and benchmark datasets show that FractalPINN-Flow produces accurate, smooth, and edge-preserving optical flow fields. The model is especially effective for high-resolution data and scenarios with limited annotations.
