Table of Contents
Fetching ...

SeeClear: Reliable Transparent Object Depth Estimation via Generative Opacification

Xiaoying Wang, Yumeng He, Jingkai Shi, Jiayin Lu, Yin Yang, Ying Jiang, Chenfanfu Jiang

Abstract

Monocular depth estimation remains challenging for transparent objects, where refraction and transmission are difficult to model and break the appearance assumptions used by depth networks. As a result, state-of-the-art estimators often produce unstable or incorrect depth predictions for transparent materials. We propose SeeClear, a novel framework that converts transparent objects into generative opaque images, enabling stable monocular depth estimation for transparent objects. Given an input image, we first localize transparent regions and transform their refractive appearance into geometrically consistent opaque shapes using a diffusion-based generative opacification module. The processed image is then fed into an off-the-shelf monocular depth estimator without retraining or architectural changes. To train the opacification model, we construct SeeClear-396k, a synthetic dataset containing 396k paired transparent-opaque renderings. Experiments on both synthetic and real-world datasets show that SeeClear significantly improves depth estimation for transparent objects. Project page: https://heyumeng.com/SeeClear-web/

SeeClear: Reliable Transparent Object Depth Estimation via Generative Opacification

Abstract

Monocular depth estimation remains challenging for transparent objects, where refraction and transmission are difficult to model and break the appearance assumptions used by depth networks. As a result, state-of-the-art estimators often produce unstable or incorrect depth predictions for transparent materials. We propose SeeClear, a novel framework that converts transparent objects into generative opaque images, enabling stable monocular depth estimation for transparent objects. Given an input image, we first localize transparent regions and transform their refractive appearance into geometrically consistent opaque shapes using a diffusion-based generative opacification module. The processed image is then fed into an off-the-shelf monocular depth estimator without retraining or architectural changes. To train the opacification model, we construct SeeClear-396k, a synthetic dataset containing 396k paired transparent-opaque renderings. Experiments on both synthetic and real-world datasets show that SeeClear significantly improves depth estimation for transparent objects. Project page: https://heyumeng.com/SeeClear-web/
Paper Structure (27 sections, 15 equations, 12 figures, 2 tables)

This paper contains 27 sections, 15 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: SeeClear is a novel framework that converts transparent objects into generative opaque images, predicting stable and accurate depth for transparent objects.
  • Figure 2: Pipeline Overview. Starting from an image, we first apply a segmentation model to obtain the transparent object mask. Guided by the mask and the image, a latent diffusion model generates an opacified image of the transparent object. A mask refinement module then predicts a soft blending mask to alpha-composite the generated opaque region with the original background, producing the final composited image. The composited image is finally fed into a depth model to estimate accurate depth.
  • Figure 3: Rendering Pipeline. We build the SeeClear-396k dataset for transparent-object depth estimation. For each object–scene configuration, Blender renders paired transparent ($I^{tr}$) and opaque ($I^{op}$) images with identical geometry, camera pose, and illumination. Viewpoint, lighting, and anisotropic shape variations are systematically sampled to produce diverse training data with aligned depth, normals, and masks.
  • Figure 4: Qualitative Comparison. We evaluate transparent-object depth estimation on ClearGrasp sajjan2020clear (Columns 1–3) and TransPhy3D xu2025diffusion datasets (Columns 4-6). Compared with the baseline. SeeClear produces accurate transparent-object depth. Depth maps in the circle are normalized in grayscale.
  • Figure 5: In-the-Wild Qualitative Comparison (Part I). Scenes include a pair of glasses (row 1), multiple transparent objects with liquid and inter-object occlusion (row 2), a transparent container with an out-of-distribution shape enclosing opaque chocolate objects under a plastic film (row 3), a transparent plastic drawer cabinet containing opaque objects (row 4), multiple transparent objects with inter-object occlusion and liquid contents (row 5), a single transparent glass containing liquid and plants extending from inside to outside (row 6), and multiple transparent objects with opaque labels and liquid contents (row 7). Depth4ToM produces predictions with blurrier boundaries compared with other methods; in rows 1, 5, and 6 the transparent surfaces appear translucent, and in rows 2 and 7 the depth fails to reflect object surface geometry such as cup openings and bottle shapes. Depth Anything V3 and MoGe-2 exhibit pronounced translucency across all rows. Marigold and GenPercept both show translucency in rows 1, 3, 5, 6, and 7; additionally, Marigold exhibits translucency in row 4, and GenPercept exhibits translucency in row 2. SeeClear produces no translucency across all scenes, accurately recovers object surface geometry, and preserves seamless depth transitions at material boundaries.
  • ...and 7 more figures