LEMON: Localized Editing with Mesh Optimization and Neural Shaders
Furkan Mert Algan, Umut Yazgan, Driton Salihu, Cem Eteke, Eckehard Steinbach
TL;DR
LEMON addresses the challenge of editing polygonal meshes from multi-view images under natural language prompts while preserving the original geometry. It fuses neural deferred shading with localized mesh optimization, using vertex-level importance scores and ControlNet-conditioned diffusion to drive text-guided edits, with iterative dataset updates and mesh deformation. Evaluated on the DTU dataset, LEMON demonstrates superior alignment to prompts (via CLIP Directional Similarity) and maintains geometric integrity while delivering faster, more consistent results than several baselines. The approach offers a practical path to precise, localized mesh editing with coordinated appearance changes, though it relies on masking and suggests future inpainting-based extensions to add new geometry.
Abstract
In practical use cases, polygonal mesh editing can be faster than generating new ones, but it can still be challenging and time-consuming for users. Existing solutions for this problem tend to focus on a single task, either geometry or novel view synthesis, which often leads to disjointed results between the mesh and view. In this work, we propose LEMON, a mesh editing pipeline that combines neural deferred shading with localized mesh optimization. Our approach begins by identifying the most important vertices in the mesh for editing, utilizing a segmentation model to focus on these key regions. Given multi-view images of an object, we optimize a neural shader and a polygonal mesh while extracting the normal map and the rendered image from each view. By using these outputs as conditioning data, we edit the input images with a text-to-image diffusion model and iteratively update our dataset while deforming the mesh. This process results in a polygonal mesh that is edited according to the given text instruction, preserving the geometric characteristics of the initial mesh while focusing on the most significant areas. We evaluate our pipeline using the DTU dataset, demonstrating that it generates finely-edited meshes more rapidly than the current state-of-the-art methods. We include our code and additional results in the supplementary material.
