Table of Contents
Fetching ...

Towards Minimal Focal Stack in Shape from Focus

Khurram Ashfaq, Muhammad Tariq Mahmood

Abstract

Shape from Focus (SFF) is a depth reconstruction technique that estimates scene structure from focus variations observed across a focal stack, that is, a sequence of images captured at different focus settings. A key limitation of SFF methods is their reliance on densely sampled, large focal stacks, which limits their practical applicability. In this study, we propose a focal stack augmentation that enables SFF methods to estimate depth using a reduced stack of just two images, without sacrificing precision. We introduce a simple yet effective physics-based focal stack augmentation that enriches the stack with two auxiliary cues: an all-in-focus (AiF) image estimated from two input images, and Energy-of-Difference (EOD) maps, computed as the energy of differences between the AiF and input images. Furthermore, we propose a deep network that computes a deep focus volume from the augmented focal stacks and iteratively refines depth using convolutional Gated Recurrent Units (ConvGRUs) at multiple scales. Extensive experiments on both synthetic and real-world datasets demonstrate that the proposed augmentation benefits existing state-of-the-art SFF models, enabling them to achieve comparable accuracy. The results also show that our approach maintains state-of-the-art performance with a minimal stack size.

Towards Minimal Focal Stack in Shape from Focus

Abstract

Shape from Focus (SFF) is a depth reconstruction technique that estimates scene structure from focus variations observed across a focal stack, that is, a sequence of images captured at different focus settings. A key limitation of SFF methods is their reliance on densely sampled, large focal stacks, which limits their practical applicability. In this study, we propose a focal stack augmentation that enables SFF methods to estimate depth using a reduced stack of just two images, without sacrificing precision. We introduce a simple yet effective physics-based focal stack augmentation that enriches the stack with two auxiliary cues: an all-in-focus (AiF) image estimated from two input images, and Energy-of-Difference (EOD) maps, computed as the energy of differences between the AiF and input images. Furthermore, we propose a deep network that computes a deep focus volume from the augmented focal stacks and iteratively refines depth using convolutional Gated Recurrent Units (ConvGRUs) at multiple scales. Extensive experiments on both synthetic and real-world datasets demonstrate that the proposed augmentation benefits existing state-of-the-art SFF models, enabling them to achieve comparable accuracy. The results also show that our approach maintains state-of-the-art performance with a minimal stack size.

Paper Structure

This paper contains 17 sections, 1 theorem, 12 equations, 10 figures, 5 tables.

Key Result

Proposition 1

: The energy of difference $E(p)$ is directly proportional to the blur level $\sigma$ and inversely proportional to the focus measure (FM) operator. $\blacktriangleleft$$\blacktriangleleft$

Figures (10)

  • Figure 1: First Row: A synthetic AiF image of size $64 \times 64$ is generated and defocused images by convolving with a Gaussian with $\sigma \in\left\{0.5,1.0,1.5,2.0 \right\}$. Second Row: (left) Illustration of defocused image formation, (right) behavior of EOD and Laplacian focus measure (FM).
  • Figure 2: The proposed framework has three modules: Focal Stack Augmentation, Deep Focus Volume, and Depth Extraction.
  • Figure 3: Left: The architecture of the DFV module. For brevity, the feature encoder from which the feature volumes ${\{\mathbf{A}_1, \mathbf{A}_2, \mathbf{A}_3, \mathbf{A}_4\}}$ are extracted is not shown. Right: The recurrent update block, where GRUs operating at coarse, medium, and fine scales exchange multi-scale information among themselves.
  • Figure 4: EOD visualization: (Row:1) GT AiF and corresponding EOD maps, (Row:2) Estimated AiF and corresponding EOD maps, (Row:3) EOD computed across the focal stack for AiFs estimated using 2 and 3 slices compared with the GT AiF.
  • Figure 5: Qualitative Results for EOD Ablation. The numbers on each map denote RMS error.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof