Table of Contents
Fetching ...

FractalPINN-Flow: A Fractal-Inspired Network for Unsupervised Optical Flow Estimation with Total Variation Regularization

Sara Behnamian, Rasoul Khaksarinezhad, Andreas Langer

TL;DR

FractalPINN-Flow tackles unsupervised dense optical flow estimation from two grayscale frames by integrating a fractal-inspired multiscale deformation network with total variation regularization. The method optimizes a variational energy that combines $L^1$ and $L^2$ data terms with an anisotropic $TV$ penalty, while $w$ is produced end-to-end by a Fractal Deformation Network and a lightweight flow head. Key contributions include the Fractal Deformation Network architecture, a projection layer that preserves spatial detail, and a training regime that achieves edge-preserving, coherent flow without ground-truth supervision. Empirical results on synthetic and Middlebury benchmarks demonstrate robust performance for high-resolution data, with TV regularization offering a tunable balance between detail preservation and smoothness that adapts to scene complexity.

Abstract

We present FractalPINN-Flow, an unsupervised deep learning framework for dense optical flow estimation that learns directly from consecutive grayscale frames without requiring ground truth. The architecture centers on the Fractal Deformation Network (FDN) - a recursive encoder-decoder inspired by fractal geometry and self-similarity. Unlike traditional CNNs with sequential downsampling, FDN uses repeated encoder-decoder nesting with skip connections to capture both fine-grained details and long-range motion patterns. The training objective is based on a classical variational formulation using total variation (TV) regularization. Specifically, we minimize an energy functional that combines $L^1$ and $L^2$ data fidelity terms to enforce brightness constancy, along with a TV term that promotes spatial smoothness and coherent flow fields. Experiments on synthetic and benchmark datasets show that FractalPINN-Flow produces accurate, smooth, and edge-preserving optical flow fields. The model is especially effective for high-resolution data and scenarios with limited annotations.

FractalPINN-Flow: A Fractal-Inspired Network for Unsupervised Optical Flow Estimation with Total Variation Regularization

TL;DR

FractalPINN-Flow tackles unsupervised dense optical flow estimation from two grayscale frames by integrating a fractal-inspired multiscale deformation network with total variation regularization. The method optimizes a variational energy that combines and data terms with an anisotropic penalty, while is produced end-to-end by a Fractal Deformation Network and a lightweight flow head. Key contributions include the Fractal Deformation Network architecture, a projection layer that preserves spatial detail, and a training regime that achieves edge-preserving, coherent flow without ground-truth supervision. Empirical results on synthetic and Middlebury benchmarks demonstrate robust performance for high-resolution data, with TV regularization offering a tunable balance between detail preservation and smoothness that adapts to scene complexity.

Abstract

We present FractalPINN-Flow, an unsupervised deep learning framework for dense optical flow estimation that learns directly from consecutive grayscale frames without requiring ground truth. The architecture centers on the Fractal Deformation Network (FDN) - a recursive encoder-decoder inspired by fractal geometry and self-similarity. Unlike traditional CNNs with sequential downsampling, FDN uses repeated encoder-decoder nesting with skip connections to capture both fine-grained details and long-range motion patterns. The training objective is based on a classical variational formulation using total variation (TV) regularization. Specifically, we minimize an energy functional that combines and data fidelity terms to enforce brightness constancy, along with a TV term that promotes spatial smoothness and coherent flow fields. Experiments on synthetic and benchmark datasets show that FractalPINN-Flow produces accurate, smooth, and edge-preserving optical flow fields. The model is especially effective for high-resolution data and scenarios with limited annotations.

Paper Structure

This paper contains 17 sections, 10 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Synthetic Shepp-Logan phantom experiment. Top row (left to right): original phantom, synthetic frame 1 with embedded circles, warped frame 2, and color wheel. Bottom row: ground truth flow, predicted flows for $\lambda_{\mathrm{TV}} = 0$ and $10^{-5}$, respectively, all trained for 10,000 epochs.
  • Figure 2: Training curves for the Shepp-Logan phantom experiment using total variation regularization with $\lambda_{\mathrm{TV}} = 10^{-5}$. The model is trained for 10,000 epochs. Left: Loss history showing stable convergence with a final best loss of $1.23 \times 10^{-7}$. Middle: AEE curve indicating accurate magnitude estimation of flow vectors, with a best AEE of $2.30 \times 10^{-2}$ and SDEE of $1.88 \times 10^{-1}$. Right: AAE decreasing to a final value of $7.23\times 10^{-1}$, with SDAE of $5.87$.
  • Figure 3: Middlebury Optical Flow Benchmark visualizations corresponding to the results in Table \ref{['tab:tv-results']}. Columns from left to right: $I_1$, $I_2$, ground truth optical flow, and predicted flows for $\lambda_{\mathrm{TV}} = 0$, $10^{-3}$, $10^{-2}$, $10^{-1}$. Predicted flow fields are taken from the best-loss epoch for each configuration. Benchmarks from top to bottom: Dimetrodon, Grove2, Grove3, Hydrangea, RubberWhale, Urban2, Urban3, Venus.