Efficient High-Resolution Image Editing with Hallucination-Aware Loss and Adaptive Tiling
Young D. Kwon, Abhinav Mehrotra, Malcolm Chadwick, Alberto Gil Ramos, Sourav Bhattacharya
TL;DR
This work tackles the challenge of high-resolution on-device image editing with diffusion models by introducing MobilePicasso, a 3-stage hybrid pipeline that edits at standard resolution, projects into latent space, and upscales to 4K. A hallucination-aware loss paired with artefact filtering reduces artefacts, while Adaptive Context-Preserving Tilting (ACPT) and model-system co-design dramatically cut latency and memory usage on mobile hardware. Empirical results show up to 55.8× speedups and 1.15 GB peak memory on a Galaxy S23, with 14–51% reductions in hallucinations and 18–48% improvements in image quality, validated by a 46-participant user study. These advances enable practical, private, on-device high-resolution editing and offer a framework applicable to broader mobile generative AI tasks.
Abstract
High-resolution (4K) image-to-image synthesis has become increasingly important for mobile applications. Existing diffusion models for image editing face significant challenges, in terms of memory and image quality, when deployed on resource-constrained devices. In this paper, we present MobilePicasso, a novel system that enables efficient image editing at high resolutions, while minimising computational cost and memory usage. MobilePicasso comprises three stages: (i) performing image editing at a standard resolution with hallucination-aware loss, (ii) applying latent projection to overcome going to the pixel space, and (iii) upscaling the edited image latent to a higher resolution with adaptive context-preserving tiling. Our user study with 46 participants reveals that MobilePicasso not only improves image quality by 18-48% but reduces hallucinations by 14-51% over existing methods. MobilePicasso demonstrates significantly lower latency, e.g., up to 55.8$\times$ speed-up, yet with a small increase in runtime memory, e.g., a mere 9% increase over prior work. Surprisingly, the on-device runtime of MobilePicasso is observed to be faster than a server-based high-resolution image editing model running on an A100 GPU.
