Table of Contents
Fetching ...

Cross-Camera Cow Identification via Disentangled Representation Learning

Runcheng Wang, Yaru Chen, Guiguo Zhang, Honghua Jiang, Yongliang Qiao

TL;DR

The paper tackles cross-camera cow identification in uncontrolled farm environments by introducing a disentangled representation framework grounded in Subspace Identification Guarantee (SIG) theory. It models image formation with four latent subspaces, $z={z_1,z_2,z_3,z_4}$, where $z_3$ captures intrinsic identity features and $z_1,z_2$ encode camera and viewpoint interference, while $z_4$ covers universal appearance; a conditional decision mechanism combines $z_2$, $z_3$, and the camera index $u$ for robust identity inference. The approach uses trunk-focused extraction via YOLOv11n, a variational encoder with ELBO and a camera-predictor constraint to enforce subspace disentanglement, and a class-aware centroid alignment to mitigate label distribution shifts across cameras. On the multi-view CCCI60 dataset, the method achieves an average accuracy of 86.0%, significantly surpassing a Source-only baseline of 51.9% and a strong domain-adaptation baseline (iMSDA) at 79.8%, demonstrating improved cross-camera generalization in realistic farming settings. These results suggest a principled, physics-informed route to reliable non-contact livestock monitoring and have practical implications for scalable, multi-node smart farming deployments.

Abstract

Precise identification of individual cows is a fundamental prerequisite for comprehensive digital management in smart livestock farming. While existing animal identification methods excel in controlled, single-camera settings, they face severe challenges regarding cross-camera generalization. When models trained on source cameras are deployed to new monitoring nodes characterized by divergent illumination, backgrounds, viewpoints, and heterogeneous imaging properties, recognition performance often degrades dramatically. This limits the large-scale application of non-contact technologies in dynamic, real-world farming environments. To address this challenge, this study proposes a cross-camera cow identification framework based on disentangled representation learning. This framework leverages the Subspace Identifiability Guarantee (SIG) theory in the context of bovine visual recognition. By modeling the underlying physical data generation process, we designed a principle-driven feature disentanglement module that decomposes observed images into multiple orthogonal latent subspaces. This mechanism effectively isolates stable, identity-related biometric features that remain invariant across cameras, thereby substantially improving generalization to unseen cameras. We constructed a high-quality dataset spanning five distinct camera nodes, covering heterogeneous acquisition devices and complex variations in lighting and angles. Extensive experiments across seven cross-camera tasks demonstrate that the proposed method achieves an average accuracy of 86.0%, significantly outperforming the Source-only Baseline (51.9%) and the strongest cross-camera baseline method (79.8%). This work establishes a subspace-theoretic feature disentanglement framework for collaborative cross-camera cow identification, offering a new paradigm for precise animal monitoring in uncontrolled smart farming environments.

Cross-Camera Cow Identification via Disentangled Representation Learning

TL;DR

The paper tackles cross-camera cow identification in uncontrolled farm environments by introducing a disentangled representation framework grounded in Subspace Identification Guarantee (SIG) theory. It models image formation with four latent subspaces, , where captures intrinsic identity features and encode camera and viewpoint interference, while covers universal appearance; a conditional decision mechanism combines , , and the camera index for robust identity inference. The approach uses trunk-focused extraction via YOLOv11n, a variational encoder with ELBO and a camera-predictor constraint to enforce subspace disentanglement, and a class-aware centroid alignment to mitigate label distribution shifts across cameras. On the multi-view CCCI60 dataset, the method achieves an average accuracy of 86.0%, significantly surpassing a Source-only baseline of 51.9% and a strong domain-adaptation baseline (iMSDA) at 79.8%, demonstrating improved cross-camera generalization in realistic farming settings. These results suggest a principled, physics-informed route to reliable non-contact livestock monitoring and have practical implications for scalable, multi-node smart farming deployments.

Abstract

Precise identification of individual cows is a fundamental prerequisite for comprehensive digital management in smart livestock farming. While existing animal identification methods excel in controlled, single-camera settings, they face severe challenges regarding cross-camera generalization. When models trained on source cameras are deployed to new monitoring nodes characterized by divergent illumination, backgrounds, viewpoints, and heterogeneous imaging properties, recognition performance often degrades dramatically. This limits the large-scale application of non-contact technologies in dynamic, real-world farming environments. To address this challenge, this study proposes a cross-camera cow identification framework based on disentangled representation learning. This framework leverages the Subspace Identifiability Guarantee (SIG) theory in the context of bovine visual recognition. By modeling the underlying physical data generation process, we designed a principle-driven feature disentanglement module that decomposes observed images into multiple orthogonal latent subspaces. This mechanism effectively isolates stable, identity-related biometric features that remain invariant across cameras, thereby substantially improving generalization to unseen cameras. We constructed a high-quality dataset spanning five distinct camera nodes, covering heterogeneous acquisition devices and complex variations in lighting and angles. Extensive experiments across seven cross-camera tasks demonstrate that the proposed method achieves an average accuracy of 86.0%, significantly outperforming the Source-only Baseline (51.9%) and the strongest cross-camera baseline method (79.8%). This work establishes a subspace-theoretic feature disentanglement framework for collaborative cross-camera cow identification, offering a new paradigm for precise animal monitoring in uncontrolled smart farming environments.
Paper Structure (26 sections, 15 equations, 6 figures, 3 tables)

This paper contains 26 sections, 15 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Layout of the cross-camera data acquisition system and scene examples. The central schematic illustrates the complete route of cows traveling from Barn No. 6 to the milking parlor and returning, where numbered circles ① through ⑥ indicate the positions of the six cameras. The surrounding panels ① through ⑥ display real-world scene images captured by the cameras at the corresponding positions. The specific layout is as follows: ① Barn Exit; ② and ③ Walking Aisle; ④ and ⑤ Milking Parlor Entrance; and ⑥ Milking Parlor Exit.
  • Figure 2: Visualization of samples from the cross-camera cow identification dataset. The figure displays representative images of four different cows (ID: 001, 024, 048, 060).
  • Figure 3: Physical generative graph. Observed variables: Camera index $u$ , individual identity label $y$ , and cow image $x$ . Latent variables $z = \left\{ {{z_1},{z_2},{z_3},{z_4}} \right\}$
  • Figure 4: Schematic overview of the proposed framework
  • Figure 5: Architecture of the proposed model
  • ...and 1 more figures