Table of Contents
Fetching ...

N4MC: Neural 4D Mesh Compression

Guodong Chen, Huanshuo Dong, Mallesham Dasari

TL;DR

N4MC is presented, the first 4D neural compression framework to efficiently compress time-varying mesh sequences by exploiting their temporal redundancy, and introduces a transformer-based interpolation model that predicts intermediate mesh frames conditioned on latent embeddings derived from tracked volume centers, eliminating motion ambiguities.

Abstract

We present N4MC, the first 4D neural compression framework to efficiently compress time-varying mesh sequences by exploiting their temporal redundancy. Unlike prior neural mesh compression methods that treat each mesh frame independently, N4MC takes inspiration from inter-frame compression in 2D video codecs, and learns motion compensation in long mesh sequences. Specifically, N4MC converts consecutive irregular mesh frames into regular 4D tensors to provide a uniform and compact representation. These tensors are then condensed using an auto-decoder, which captures both spatial and temporal correlations for redundancy removal. To enhance temporal coherence, we introduce a transformer-based interpolation model that predicts intermediate mesh frames conditioned on latent embeddings derived from tracked volume centers, eliminating motion ambiguities. Extensive evaluations show that N4MC outperforms state-of-the-art in rate-distortion performance, while enabling real-time decoding of 4D mesh sequences. The implementation of our method is available at: https://github.com/frozzzen3/N4MC.

N4MC: Neural 4D Mesh Compression

TL;DR

N4MC is presented, the first 4D neural compression framework to efficiently compress time-varying mesh sequences by exploiting their temporal redundancy, and introduces a transformer-based interpolation model that predicts intermediate mesh frames conditioned on latent embeddings derived from tracked volume centers, eliminating motion ambiguities.

Abstract

We present N4MC, the first 4D neural compression framework to efficiently compress time-varying mesh sequences by exploiting their temporal redundancy. Unlike prior neural mesh compression methods that treat each mesh frame independently, N4MC takes inspiration from inter-frame compression in 2D video codecs, and learns motion compensation in long mesh sequences. Specifically, N4MC converts consecutive irregular mesh frames into regular 4D tensors to provide a uniform and compact representation. These tensors are then condensed using an auto-decoder, which captures both spatial and temporal correlations for redundancy removal. To enhance temporal coherence, we introduce a transformer-based interpolation model that predicts intermediate mesh frames conditioned on latent embeddings derived from tracked volume centers, eliminating motion ambiguities. Extensive evaluations show that N4MC outperforms state-of-the-art in rate-distortion performance, while enabling real-time decoding of 4D mesh sequences. The implementation of our method is available at: https://github.com/frozzzen3/N4MC.
Paper Structure (39 sections, 10 equations, 14 figures, 6 tables, 1 algorithm)

This paper contains 39 sections, 10 equations, 14 figures, 6 tables, 1 algorithm.

Figures (14)

  • Figure 1: N4MC is the first neural 4D mesh compression framework. It first converts 4D meshes into a modified version of Truncated Signed Distance Function (TSDF) tensors and compresses them through an auto-encoder-decoder pair. A volume tracking-based transformer model exploits temporal redundancy by learning 3D interpolation across frames. N4MC produces lightweight, sequence-specific models optimized for each mesh sequence, enabling real-time decoding on desktops. N4MC also provides a Unity plugin for mobile devices (e.g., Meta Quest 3) decoding and playback.
  • Figure 2: N4MC overview. N4MC enables neural 4D mesh compression with 4 modules: (1) A TSDF-Def generation module (top left); (2) A trained auto-encoder and auto-decoder pair to condense TSDF-Def tensors (bottom left); (3) A volume tracking module and a latent code mapping network to generate interpolation priors (top right); (4) A light-weight transformer model for 3D interpolation (bottom right).
  • Figure 3: Cross-section illustration of N4MC's volume tracking on input 4D mesh frames to obtain sets of volume centers (orange), and conversion of keyframe meshes (gray) into TSDF-Def tensors. For better visualization, we show the $(4\times8\times8)$ intermediate cross-sections of the TSDF tensors. N4MC leverages the tracked volume centers to extract priors for interpolation, to guide the interpolation for regions such as the hands and basketball.
  • Figure 4: Qualitative comparison of our N4MC with baselines at a bitrate around 4 Mbps. For image-based SSIM and PSNR evaluation, lighting effects are disabled, and vertex normals are converted to RGB colors before rendering. The "Dancer" and "Basketball player" sequences appear green, while "Mitch" and "Thomas" show pink due to their different orientations. We show a front-facing perspective on people here for all datasets for fair evaluation. To save space, we show full meshes for "Dancer" dataset and only ground truth with zoomed-in regions for "Basketball player", "Mitch", and "Thomas". N4MC outperforms all the baselines with very high quality.
  • Figure 5: Objective rate–distortion (RD) performance comparison of image-based SSIM versus bitrate on MPEG sequences. To get the target bitrates, for TVMC chen2025tvmc and Draco Draco2024, the quantization parameter $qp$ is varied from 4 to 14. For NeCGS ren2024necgs and our N4MC, TSDF-Def resolutions of 64, 128, and 256 are used. For KLT realtime2018, 16, 32, and 128 basis vectors are employed.
  • ...and 9 more figures