Table of Contents
Fetching ...

MouseGPT: A Large-scale Vision-Language Model for Mouse Behavior Analysis

Teng Xu, Taotao Zhou, Youjia Wang, Peng Yang, Simin Tang, Kuixiang Shao, Zifeng Tang, Yifei Liu, Xinyuan Chen, Hongshuang Wang, Xiaohui Wang, Huoqing Luo, Jingya Wang, Ji Hu, Jingyi Yu

TL;DR

MouseGPT tackles the challenge of scalable, interpretable mouse behavior analysis by unifying multi-view visual data with natural language. It introduces a large open-vocabulary dataset (>42 million frames) and trains a Vision-Language foundation to translate pose dynamics into contextually rich descriptions. The framework supports comprehensive behavior profiling, fine-grained analysis, novel behavior discovery, and behavioral-phenotype prediction, accessible via a natural-language user interface. Across comparisons with open-source and proprietary models, MouseGPT demonstrates superior precision, adaptability, and descriptive richness, enabling scalable ethological research and translational insights into neuropsychiatric conditions.

Abstract

Analyzing animal behavior is crucial in advancing neuroscience, yet quantifying and deciphering its intricate dynamics remains a significant challenge. Traditional machine vision approaches, despite their ability to detect spontaneous behaviors, fall short due to limited interpretability and reliance on manual labeling, which restricts the exploration of the full behavioral spectrum. Here, we introduce MouseGPT, a Vision-Language Model (VLM) that integrates visual cues with natural language to revolutionize mouse behavior analysis. Built upon our first-of-its-kind dataset - incorporating pose dynamics and open-vocabulary behavioral annotations across over 42 million frames of diverse psychiatric conditions - MouseGPT provides a novel, context-rich method for comprehensive behavior interpretation. Our holistic analysis framework enables detailed behavior profiling, clustering, and novel behavior discovery, offering deep insights without the need for labor - intensive manual annotation. Evaluations reveal that MouseGPT surpasses existing models in precision, adaptability, and descriptive richness, positioning it as a transformative tool for ethology and for unraveling complex behavioral dynamics in animal models.

MouseGPT: A Large-scale Vision-Language Model for Mouse Behavior Analysis

TL;DR

MouseGPT tackles the challenge of scalable, interpretable mouse behavior analysis by unifying multi-view visual data with natural language. It introduces a large open-vocabulary dataset (>42 million frames) and trains a Vision-Language foundation to translate pose dynamics into contextually rich descriptions. The framework supports comprehensive behavior profiling, fine-grained analysis, novel behavior discovery, and behavioral-phenotype prediction, accessible via a natural-language user interface. Across comparisons with open-source and proprietary models, MouseGPT demonstrates superior precision, adaptability, and descriptive richness, enabling scalable ethological research and translational insights into neuropsychiatric conditions.

Abstract

Analyzing animal behavior is crucial in advancing neuroscience, yet quantifying and deciphering its intricate dynamics remains a significant challenge. Traditional machine vision approaches, despite their ability to detect spontaneous behaviors, fall short due to limited interpretability and reliance on manual labeling, which restricts the exploration of the full behavioral spectrum. Here, we introduce MouseGPT, a Vision-Language Model (VLM) that integrates visual cues with natural language to revolutionize mouse behavior analysis. Built upon our first-of-its-kind dataset - incorporating pose dynamics and open-vocabulary behavioral annotations across over 42 million frames of diverse psychiatric conditions - MouseGPT provides a novel, context-rich method for comprehensive behavior interpretation. Our holistic analysis framework enables detailed behavior profiling, clustering, and novel behavior discovery, offering deep insights without the need for labor - intensive manual annotation. Evaluations reveal that MouseGPT surpasses existing models in precision, adaptability, and descriptive richness, positioning it as a transformative tool for ethology and for unraveling complex behavioral dynamics in animal models.

Paper Structure

This paper contains 46 sections, 4 equations, 12 figures, 1 table, 2 algorithms.

Figures (12)

  • Figure 1: We present MouseGPT, a comprehensive mouse behavior understanding model and analysis framework. By integrating multi-view video data, pose estimation, and kinematic embeddings, MouseGPT generates open-vocabulary ethological descriptions of mouse behaviors. It leverages a visual-guided prompt engine and a pioneering open-vocabulary behavioral dataset, featuring disease-specific behavior phenotypes (e.g., hallucination, depression, schizophrenia). The system supports behavior classification, temporal analysis, fine-grained behavior profiling, and novel behavior discovery, delivering transformative capabilities for studying and analyzing complex animal behaviors.
  • Figure 2: MouseGPT Behavior Understanding Model. (a) Data Preparation: A multi-view 3D capture system generates keypoints and limb velocities for each frame, with training data produced by GPT-4o and refined through automated selection processes. (b) Vision-Language Model Training and Inference Workflow: MouseGPT-Large (70.6B parameters) is optimized for detailed behavior analysis, while MouseGPT-Lite (7.84B parameters) provides a lightweight alternative for streamlined tasks. (c) Performance Evaluation: Expert assessments highlight MouseGPT's superior performance against open-source models of comparable size and GPT-4o in understanding mouse behavior. (d) Comparison with Other VLMs: Benchmarking against existing Vision-Language Models (VLMs) further demonstrates MouseGPT's advanced capabilities, as rated by domain experts. (e) Dataset Visualization: UMAP projection of the MouseGPT dataset in 2D, with color-coded clusters representing distinct drug treatments.
  • Figure 3: MouseGPT Behavior Analysis Framework. (a) Data Flow: MouseGPT converts open-vocabulary behavior descriptions into two computable features: high-dimensional embeddings for semantic analysis and keyword-based summaries for interpretability, powering various downstream applications. (b) Major Behavior Profiling: A two-stage LLM-enhanced clustering groups mouse actions into distinct behavioral categories, with colors representing clusters, enabling robust profiling. (c) Fine-Grained Analysis: MouseGPT extracts and quantifies detailed behavior patterns via keyword analysis and frequency calculations, offering precise insights. (d) Novel Behavior Discovery: Using Isolation Forest in embedding space, MouseGPT identifies anomalous behaviors, visualized in 2D UMAP to highlight rare actions. (e) Phenotype Prediction: MouseGPT combines behavioral analysis with expert knowledge to predict drug treatments, aiding in phenotype understanding and experimental outcomes.
  • Figure 4: MouseGPT Analyzes Distinct Phenotypes of Spontaneous Behaviors Induced by Hallucinogens. (a) Spatial Distribution of Major Behaviors: The spatial positions and behaviors of mice in a circular open field were mapped. Each circle represents all mice in a treatment group, with positions determined by spinal midpoints and behaviors color-coded as in (b). (b) Major Behavior Ethogram: Time-resolved behavior changes during the first 20 minutes post-treatment are depicted for each mouse in all groups, sampled consistently with (a). Rows represent individual mice, illustrating behavioral diversity. (c) Temporal Occurrence Patterns of Behaviors: Behavior proportions within 20-second time windows were visualized for different treatments. Dashed lines show group averages, and smoothed solid lines highlight trends, all starting from the same reference time T=0. Scales were adjusted to enhance the visualization of low-frequency behaviors. (d) Overall Behavioral Proportions: Mean behavior proportions during the entire recording session are presented with circles (mean) and error bars (95% confidence interval). Statistical significance was tested using two-way ANOVA with Tukey's multiple comparisons (**p < 0.01; ****p < 0.0001). (e) Fine-Grained Postural Subtypes in Walking Behavior: Stacked bar charts illustrate the proportions of walking subtypes categorized by head, limb, and torso descriptors. Each bar represents the average subtype proportion across walking segments within each treatment group. (f) Heatmap of Fine-Grained Behaviors: Differences in fine-grained behavior proportions between drug-treated and control groups are visualized. Statistical significance was assessed using two-way ANOVA with Dunnett's multiple comparisons (*p < 0.05; **p < 0.01; ****p < 0.0001). (g) Drug Prediction Matrix: MouseGPT predicted drug treatments based on behavioral phenotypes, using data from six treatment groups (n=23). Columns indicate actual treatment groups, and color intensities reflect the proportion of mice assigned to each predicted group.
  • Figure 5: MouseGPT Natural Language User Interface. (a) Interactive Interface: Users engage with the MouseGPT agent through natural language to perform clustering, behavior search, and automated code generation and execution, enabling intuitive interaction. (b) Agent Workflow: The agent integrates data access, MouseGPT model invocation, and analysis tasks via a unified workflow, streamlining complex behavior analyses. (c) Toolbox Overview: A comprehensive suite of tools supports major behavior profiling, fine-grained analysis, novel behavior detection, temporal analysis, and phenotype prediction. (d) Python REPL Integration: The agent dynamically generates and executes Python scripts to process data, exemplifying seamless automation. (e) Behavior Search: Embedding-based retrieval matches user-specified behaviors to the database, ranking results by similarity for precise behavior identification.
  • ...and 7 more figures