Table of Contents
Fetching ...

Refracting Reality: Generating Images with Realistic Transparent Objects

Yue Yin, Enze Tao, Dylan Campbell

TL;DR

This work tackles the challenge of rendering transparent objects with physically accurate refraction in text-to-image generation. It introduces Snellcaster, a training-free pipeline that enforces refraction and reflection physics at every generation step by synchronizing pixels inside the object with external scene pixels and with a panorama centered on the object. The method precomputes ray-traced warps through depth and object geometry, then performs cross-view synchronization and Fresnel-based blending to produce coherent refractive and reflective effects, complemented by Laplacian pyramid warping and time-travel denoising for stability. Experiments on indoor scenes show substantial improvements over baselines in both perceptual quality and fidelity to ground-truth refractions, demonstrating the approach's potential to significantly enhance the realism of transparent-object rendering in diffusion-based generation. The work further suggests promising future directions, including modeling multiple materials, native lighting, shadow handling, and video extension for optically accurate sequences.

Abstract

Generative image models can produce convincingly real images, with plausible shapes, textures, layouts and lighting. However, one domain in which they perform notably poorly is in the synthesis of transparent objects, which exhibit refraction, reflection, absorption and scattering. Refraction is a particular challenge, because refracted pixel rays often intersect with surfaces observed in other parts of the image, providing a constraint on the color. It is clear from inspection that generative models have not distilled the laws of optics sufficiently well to accurately render refractive objects. In this work, we consider the problem of generating images with accurate refraction, given a text prompt. We synchronize the pixels within the object's boundary with those outside by warping and merging the pixels using Snell's Law of Refraction, at each step of the generation trajectory. For those surfaces that are not directly observed in the image, but are visible via refraction or reflection, we recover their appearance by synchronizing the image with a second generated image -- a panorama centered at the object -- using the same warping and merging procedure. We demonstrate that our approach generates much more optically-plausible images that respect the physical constraints.

Refracting Reality: Generating Images with Realistic Transparent Objects

TL;DR

This work tackles the challenge of rendering transparent objects with physically accurate refraction in text-to-image generation. It introduces Snellcaster, a training-free pipeline that enforces refraction and reflection physics at every generation step by synchronizing pixels inside the object with external scene pixels and with a panorama centered on the object. The method precomputes ray-traced warps through depth and object geometry, then performs cross-view synchronization and Fresnel-based blending to produce coherent refractive and reflective effects, complemented by Laplacian pyramid warping and time-travel denoising for stability. Experiments on indoor scenes show substantial improvements over baselines in both perceptual quality and fidelity to ground-truth refractions, demonstrating the approach's potential to significantly enhance the realism of transparent-object rendering in diffusion-based generation. The work further suggests promising future directions, including modeling multiple materials, native lighting, shadow handling, and video extension for optically accurate sequences.

Abstract

Generative image models can produce convincingly real images, with plausible shapes, textures, layouts and lighting. However, one domain in which they perform notably poorly is in the synthesis of transparent objects, which exhibit refraction, reflection, absorption and scattering. Refraction is a particular challenge, because refracted pixel rays often intersect with surfaces observed in other parts of the image, providing a constraint on the color. It is clear from inspection that generative models have not distilled the laws of optics sufficiently well to accurately render refractive objects. In this work, we consider the problem of generating images with accurate refraction, given a text prompt. We synchronize the pixels within the object's boundary with those outside by warping and merging the pixels using Snell's Law of Refraction, at each step of the generation trajectory. For those surfaces that are not directly observed in the image, but are visible via refraction or reflection, we recover their appearance by synchronizing the image with a second generated image -- a panorama centered at the object -- using the same warping and merging procedure. We demonstrate that our approach generates much more optically-plausible images that respect the physical constraints.

Paper Structure

This paper contains 46 sections, 13 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Snellcaster flowchart. (Top) An initial image $I_0^-$ is generated using Flux labs2025flux1kontextflowmatching. Its depth map $D^-$ is estimated (not shown) and is used to place the transparent object on a horizontal surface near the optical axis. Rays cast through this object are refracted according to Snell's Law born2013principles and intersect with the estimated geometry, defining the warping functions. These are used to synthesize an image with correct refraction for the surfaces visible in the original image. (Middle) In the main branch, we generate the perspective image from prompt $p$, including the transparent object. (Bottom) In the auxiliary branch, we concurrently generate a panoramic image from augmented prompt $p^{360}$, centered at the transparent object's location, to consistently fill in occluded or out-of-frame surfaces. (All) At each denoising step $t$, we compute Euler estimates of the clean images for both branches $I_{0|t}$ and warp them using the precomputed geometric correspondences. These are blended with the warped original image to obtain a complete perspective and panoramic image. Finally, we combine refractive and reflective contributions using Fresnel's equations, before encoding back into latent space for the next denoising step.
  • Figure 2: Qualitative comparison across six scenes. The first column is rendered in Blender with the estimated geometry and appearance from $I_0^-$, which provides a reference for the true refractions and reflections, under the caveats that the light sources and colors are incorrect and that there is missing data where refracted surfaces are not directly observed in the image $I_0^-$. The last three columns are generated using Snellcaster (ours), a Flux inpainting model, and the standard Flux generative model using the same random seeds. Our approach conforms significantly better to the true refractions, with the expected left--right and up--down flips and radial warping.
  • Figure 3: Kitchen scene example of the synchronized object-free image $I_0^-$ (top left), the generated perspective image $I_0$ (top right), and the auxiliary generated panorama $I_0^{360}$ (bottom). The panorama extends the scene in a plausible way that is consistent with the perspective view.
  • Figure 4: Ablation study of the proposed method. We compare the full model with variants that remove individual components: detail-preserving averaging, Laplacian pyramid warping, and time travel. The results are displayed without foreground object relighting, allowing the effects of each component removal to be observed more clearly. Removing detail-preserving averaging leads to the loss of sharp details, removing Laplacian pyramid warping introduces aliasing in regions with large stretching, and removing time travel makes the cross-view blending noticeably less natural with strong artifacts. The full model avoids these issues and yields sharper and more coherent results.
  • Figure 5: Image synthesis for another object type (polygonal fox): ours (left) and Flux inpainting (right), which entirely fails.
  • ...and 6 more figures