Table of Contents
Fetching ...

Looking Into the Water by Unsupervised Learning of the Surface Shape

Ori Lifschitz, Tali Treibitz, Dan Rosenbaum

TL;DR

It is shown that using implicit neural representations with periodic activation functions (SIREN) leads to effective modeling of the surface height spatio-temporal signal and its derivative, as required for image reconstruction, and outperforms the latest unsupervised image restoration approach.

Abstract

We address the problem of looking into the water from the air, where we seek to remove image distortions caused by refractions at the water surface. Our approach is based on modeling the different water surface structures at various points in time, assuming the underlying image is constant. To this end, we propose a model that consists of two neural-field networks. The first network predicts the height of the water surface at each spatial position and time, and the second network predicts the image color at each position. Using both networks, we reconstruct the observed sequence of images and can therefore use unsupervised training. We show that using implicit neural representations with periodic activation functions (SIREN) leads to effective modeling of the surface height spatio-temporal signal and its derivative, as required for image reconstruction. Using both simulated and real data we show that our method outperforms the latest unsupervised image restoration approach. In addition, it provides an estimate of the water surface.

Looking Into the Water by Unsupervised Learning of the Surface Shape

TL;DR

It is shown that using implicit neural representations with periodic activation functions (SIREN) leads to effective modeling of the surface height spatio-temporal signal and its derivative, as required for image reconstruction, and outperforms the latest unsupervised image restoration approach.

Abstract

We address the problem of looking into the water from the air, where we seek to remove image distortions caused by refractions at the water surface. Our approach is based on modeling the different water surface structures at various points in time, assuming the underlying image is constant. To this end, we propose a model that consists of two neural-field networks. The first network predicts the height of the water surface at each spatial position and time, and the second network predicts the image color at each position. Using both networks, we reconstruct the observed sequence of images and can therefore use unsupervised training. We show that using implicit neural representations with periodic activation functions (SIREN) leads to effective modeling of the surface height spatio-temporal signal and its derivative, as required for image reconstruction. Using both simulated and real data we show that our method outperforms the latest unsupervised image restoration approach. In addition, it provides an estimate of the water surface.
Paper Structure (21 sections, 5 equations, 6 figures, 4 tables)

This paper contains 21 sections, 5 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 2: Our architecture. From the left, regularized 2D spatial grids $x_{\rm reg}$ and time $t$ are inputs to a SIREN sitzmann2020implicit network that outputs surface height per frame. The gradient of the output heights, along with its average across $t$, is used for calculating distortions as in Eq. \ref{['eq:refraction']}. These are then used in another SIREN network to output the reconstructed image $I_\phi(x_{\rm reg})$ and the distorted images $I^t_{\theta,\phi}$. The predicted distorted images and the observed distorted images $I^t$ are used in the loss calculation.
  • Figure 3: According to Snell's law light passing through an interface between media with different refraction indices changes its angle (refracts). Thus, when an orthographic camera views an object submerged in water from air, the object changes its geometrical appearance as a function of the normals to the surface.
  • Figure 4: Results on the Real1 dataset james2019restoration. Marked squares indicate areas where our results are sharper than the baseline. Note sharper details in the cartoon and dice sequences, as well as straighter squares in the checkers sequence in our method.
  • Figure 5: Examples from the simulated dataset following thapa2020dynamic. Our results have less distortions, manifesting in straighter lines. The estimated distortions match the apparent distortions in the acquired frames.
  • Figure 6: Examples of surface height reconstructions. a) Checkers set from the Real1 dataset james2019restoration. The strong curvatures in the surface match the strong distortions in the input image. b) A ripple wave from the simulated dataset based on thapa2020dynamic. We show reconstructions of four frames evolving with time, where our reconstruction closely matches the ground truth used for simulation.
  • ...and 1 more figures