Refracting Reality: Generating Images with Realistic Transparent Objects
Yue Yin, Enze Tao, Dylan Campbell
TL;DR
This work tackles the challenge of rendering transparent objects with physically accurate refraction in text-to-image generation. It introduces Snellcaster, a training-free pipeline that enforces refraction and reflection physics at every generation step by synchronizing pixels inside the object with external scene pixels and with a panorama centered on the object. The method precomputes ray-traced warps through depth and object geometry, then performs cross-view synchronization and Fresnel-based blending to produce coherent refractive and reflective effects, complemented by Laplacian pyramid warping and time-travel denoising for stability. Experiments on indoor scenes show substantial improvements over baselines in both perceptual quality and fidelity to ground-truth refractions, demonstrating the approach's potential to significantly enhance the realism of transparent-object rendering in diffusion-based generation. The work further suggests promising future directions, including modeling multiple materials, native lighting, shadow handling, and video extension for optically accurate sequences.
Abstract
Generative image models can produce convincingly real images, with plausible shapes, textures, layouts and lighting. However, one domain in which they perform notably poorly is in the synthesis of transparent objects, which exhibit refraction, reflection, absorption and scattering. Refraction is a particular challenge, because refracted pixel rays often intersect with surfaces observed in other parts of the image, providing a constraint on the color. It is clear from inspection that generative models have not distilled the laws of optics sufficiently well to accurately render refractive objects. In this work, we consider the problem of generating images with accurate refraction, given a text prompt. We synchronize the pixels within the object's boundary with those outside by warping and merging the pixels using Snell's Law of Refraction, at each step of the generation trajectory. For those surfaces that are not directly observed in the image, but are visible via refraction or reflection, we recover their appearance by synchronizing the image with a second generated image -- a panorama centered at the object -- using the same warping and merging procedure. We demonstrate that our approach generates much more optically-plausible images that respect the physical constraints.
