FedMIL: Federated-Multiple Instance Learning for Video Analysis with Optimized DPP Scheduling
Ashish Bastola, Hao Wang, Xiwen Chen, Abolfazl Razi
TL;DR
This work addresses scalable video MIL in federated settings with non-IID data and privacy constraints. It introduces FedMIL, a federated MIL framework, and DPPQ, a client-selection method using a quality-diversity kernel to balance data diversity and loss gradients. Key formulations include local training with $L_p$, global aggregation $w_g^{t+1}=\sum_p \frac{n_p}{n} w_p^t$, and the selection rule $P_L(G) \propto \det(\mathbf{L_G})$ with $\mathbf{L}=\mathbf{Q}\mathbf{S}^\mathrm{T}\mathbf{S}\mathbf{Q}$, enabling robust training under non-IID distributions. Experiments on the Car Crash Dataset (CCD) and MNIST show that DPPQ improves convergence and accuracy, particularly under low data utilization, making FedMIL practical for edge deployments in smart transportation and video surveillance.
Abstract
Many AI platforms, including traffic monitoring systems, use Federated Learning (FL) for decentralized sensor data processing for learning-based applications while preserving privacy and ensuring secured information transfer. On the other hand, applying supervised learning to large data samples, like high-resolution images requires intensive human labor to label different parts of a data sample. Multiple Instance Learning (MIL) alleviates this challenge by operating over labels assigned to the 'bag' of instances. In this paper, we introduce Federated Multiple-Instance Learning (FedMIL). This framework applies federated learning to boost the training performance in video-based MIL tasks such as vehicle accident detection using distributed CCTV networks. However, data sources in decentralized settings are not typically Independently and Identically Distributed (IID), making client selection imperative to collectively represent the entire dataset with minimal clients. To address this challenge, we propose DPPQ, a framework based on the Determinantal Point Process (DPP) with a quality-based kernel to select clients with the most diverse datasets that achieve better performance compared to both random selection and current DPP-based client selection methods even with less data utilization in the majority of non-IID cases. This offers a significant advantage for deployment on edge devices with limited computational resources, providing a reliable solution for training AI models in massive smart sensor networks.
