Cross-Camera Cow Identification via Disentangled Representation Learning
Runcheng Wang, Yaru Chen, Guiguo Zhang, Honghua Jiang, Yongliang Qiao
TL;DR
The paper tackles cross-camera cow identification in uncontrolled farm environments by introducing a disentangled representation framework grounded in Subspace Identification Guarantee (SIG) theory. It models image formation with four latent subspaces, $z={z_1,z_2,z_3,z_4}$, where $z_3$ captures intrinsic identity features and $z_1,z_2$ encode camera and viewpoint interference, while $z_4$ covers universal appearance; a conditional decision mechanism combines $z_2$, $z_3$, and the camera index $u$ for robust identity inference. The approach uses trunk-focused extraction via YOLOv11n, a variational encoder with ELBO and a camera-predictor constraint to enforce subspace disentanglement, and a class-aware centroid alignment to mitigate label distribution shifts across cameras. On the multi-view CCCI60 dataset, the method achieves an average accuracy of 86.0%, significantly surpassing a Source-only baseline of 51.9% and a strong domain-adaptation baseline (iMSDA) at 79.8%, demonstrating improved cross-camera generalization in realistic farming settings. These results suggest a principled, physics-informed route to reliable non-contact livestock monitoring and have practical implications for scalable, multi-node smart farming deployments.
Abstract
Precise identification of individual cows is a fundamental prerequisite for comprehensive digital management in smart livestock farming. While existing animal identification methods excel in controlled, single-camera settings, they face severe challenges regarding cross-camera generalization. When models trained on source cameras are deployed to new monitoring nodes characterized by divergent illumination, backgrounds, viewpoints, and heterogeneous imaging properties, recognition performance often degrades dramatically. This limits the large-scale application of non-contact technologies in dynamic, real-world farming environments. To address this challenge, this study proposes a cross-camera cow identification framework based on disentangled representation learning. This framework leverages the Subspace Identifiability Guarantee (SIG) theory in the context of bovine visual recognition. By modeling the underlying physical data generation process, we designed a principle-driven feature disentanglement module that decomposes observed images into multiple orthogonal latent subspaces. This mechanism effectively isolates stable, identity-related biometric features that remain invariant across cameras, thereby substantially improving generalization to unseen cameras. We constructed a high-quality dataset spanning five distinct camera nodes, covering heterogeneous acquisition devices and complex variations in lighting and angles. Extensive experiments across seven cross-camera tasks demonstrate that the proposed method achieves an average accuracy of 86.0%, significantly outperforming the Source-only Baseline (51.9%) and the strongest cross-camera baseline method (79.8%). This work establishes a subspace-theoretic feature disentanglement framework for collaborative cross-camera cow identification, offering a new paradigm for precise animal monitoring in uncontrolled smart farming environments.
