Table of Contents
Fetching ...

MoReMouse: Monocular Reconstruction of Laboratory Mouse

Yuan Zhong, Jingxiang Sun, Zhongbin Zhang, Liang An, Yebin Liu

TL;DR

MoReMouse tackles monocular dense 3D reconstruction for the C57BL/6 mouse by integrating a Gaussian avatar-driven dense-view synthetic dataset, a transformer-based triplane network, and geodesic-based surface embeddings. The model is trained in two stages using NeRF and DMTet renderers to produce high-fidelity geometry and appearance from a single image, achieving superior novel-view synthesis on both synthetic and real data. Key contributions include the Animatable Gaussian Mouse Avatar (AGAM) for dense data, a high-resolution triplane representation, and geodesic priors that improve surface stability in dynamic regions like limbs and tail. This work advances practical, scalable 3D analysis of small animals and establishes a foundation for improved behavioral phenotyping in biomedical research, while acknowledging limitations in training diversity and global pose tracking that motivate future work.

Abstract

Laboratory mice, particularly the C57BL/6 strain, are essential animal models in biomedical research. However, accurate 3D surface motion reconstruction of mice remains a significant challenge due to their complex non-rigid deformations, textureless fur-covered surfaces, and the lack of realistic 3D mesh models. Moreover, existing visual datasets for mice reconstruction only contain sparse viewpoints without 3D geometries. To fill the gap, we introduce MoReMouse, the first monocular dense 3D reconstruction network specifically designed for C57BL/6 mice. To achieve high-fidelity 3D reconstructions, we present three key innovations. First, we create the first high-fidelity, dense-view synthetic dataset for C57BL/6 mice by rendering a realistic, anatomically accurate Gaussian mouse avatar. Second, MoReMouse leverages a transformer-based feedforward architecture combined with triplane representation, enabling high-quality 3D surface generation from a single image, optimized for the intricacies of small animal morphology. Third, we propose geodesic-based continuous correspondence embeddings on the mouse surface, which serve as strong semantic priors, improving surface consistency and reconstruction stability, especially in highly dynamic regions like limbs and tail. Through extensive quantitative and qualitative evaluations, we demonstrate that MoReMouse significantly outperforms existing open-source methods in both accuracy and robustness.

MoReMouse: Monocular Reconstruction of Laboratory Mouse

TL;DR

MoReMouse tackles monocular dense 3D reconstruction for the C57BL/6 mouse by integrating a Gaussian avatar-driven dense-view synthetic dataset, a transformer-based triplane network, and geodesic-based surface embeddings. The model is trained in two stages using NeRF and DMTet renderers to produce high-fidelity geometry and appearance from a single image, achieving superior novel-view synthesis on both synthetic and real data. Key contributions include the Animatable Gaussian Mouse Avatar (AGAM) for dense data, a high-resolution triplane representation, and geodesic priors that improve surface stability in dynamic regions like limbs and tail. This work advances practical, scalable 3D analysis of small animals and establishes a foundation for improved behavioral phenotyping in biomedical research, while acknowledging limitations in training diversity and global pose tracking that motivate future work.

Abstract

Laboratory mice, particularly the C57BL/6 strain, are essential animal models in biomedical research. However, accurate 3D surface motion reconstruction of mice remains a significant challenge due to their complex non-rigid deformations, textureless fur-covered surfaces, and the lack of realistic 3D mesh models. Moreover, existing visual datasets for mice reconstruction only contain sparse viewpoints without 3D geometries. To fill the gap, we introduce MoReMouse, the first monocular dense 3D reconstruction network specifically designed for C57BL/6 mice. To achieve high-fidelity 3D reconstructions, we present three key innovations. First, we create the first high-fidelity, dense-view synthetic dataset for C57BL/6 mice by rendering a realistic, anatomically accurate Gaussian mouse avatar. Second, MoReMouse leverages a transformer-based feedforward architecture combined with triplane representation, enabling high-quality 3D surface generation from a single image, optimized for the intricacies of small animal morphology. Third, we propose geodesic-based continuous correspondence embeddings on the mouse surface, which serve as strong semantic priors, improving surface consistency and reconstruction stability, especially in highly dynamic regions like limbs and tail. Through extensive quantitative and qualitative evaluations, we demonstrate that MoReMouse significantly outperforms existing open-source methods in both accuracy and robustness.

Paper Structure

This paper contains 39 sections, 8 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: We present MoReMouse, the first monocular dense 3D reconstruction framework for laboratory mice. Given a single-view input (top-left, captured with an iPhone 15 Pro), MoReMouse predicts high-fidelity surface geometry and appearance within 0.9 seconds using a transformer-based triplane architecture (middle). To assist model training, we render a dense-view synthetic dataset by building the first Gaussian mouse avatar from sparse-view real videos (bottom-left). Our method outputs RGB renderings, semantic embeddings, and normal maps from a single image (right). Corresponding video results are provided in the supplementary material.
  • Figure 2: Left: the articulated mouse mesh we used. Right: color-coded geodesic feature embedding, promoting surface separability and reconstruction consistency.
  • Figure 3: Pipeline of the Gaussian Mouse Avatar Generation. The process begins with Mesh Fitting, followed by Gaussian Avatar training. "CNN" here is StyleUNet.
  • Figure 4: Overview of the MoReMouse architecture. Given a single image, a DINOv2 encoder and transformer decoder generate triplane features, which are queried to produce color, density, and embeddings. Rendering is performed via NeRF or DMTet for volumetric or surface outputs.
  • Figure 5: Qualitative comparison of novel view synthesis results on synthetic data. MoReMouse produces coherent and anatomically plausible reconstructions, accurately capturing mouse posture and geometry, while baseline methods often exhibit structural distortions.
  • ...and 9 more figures