FedMIL: Federated-Multiple Instance Learning for Video Analysis with Optimized DPP Scheduling

Ashish Bastola; Hao Wang; Xiwen Chen; Abolfazl Razi

FedMIL: Federated-Multiple Instance Learning for Video Analysis with Optimized DPP Scheduling

Ashish Bastola, Hao Wang, Xiwen Chen, Abolfazl Razi

TL;DR

This work addresses scalable video MIL in federated settings with non-IID data and privacy constraints. It introduces FedMIL, a federated MIL framework, and DPPQ, a client-selection method using a quality-diversity kernel to balance data diversity and loss gradients. Key formulations include local training with $L_p$, global aggregation $w_g^{t+1}=\sum_p \frac{n_p}{n} w_p^t$, and the selection rule $P_L(G) \propto \det(\mathbf{L_G})$ with $\mathbf{L}=\mathbf{Q}\mathbf{S}^\mathrm{T}\mathbf{S}\mathbf{Q}$, enabling robust training under non-IID distributions. Experiments on the Car Crash Dataset (CCD) and MNIST show that DPPQ improves convergence and accuracy, particularly under low data utilization, making FedMIL practical for edge deployments in smart transportation and video surveillance.

Abstract

Many AI platforms, including traffic monitoring systems, use Federated Learning (FL) for decentralized sensor data processing for learning-based applications while preserving privacy and ensuring secured information transfer. On the other hand, applying supervised learning to large data samples, like high-resolution images requires intensive human labor to label different parts of a data sample. Multiple Instance Learning (MIL) alleviates this challenge by operating over labels assigned to the 'bag' of instances. In this paper, we introduce Federated Multiple-Instance Learning (FedMIL). This framework applies federated learning to boost the training performance in video-based MIL tasks such as vehicle accident detection using distributed CCTV networks. However, data sources in decentralized settings are not typically Independently and Identically Distributed (IID), making client selection imperative to collectively represent the entire dataset with minimal clients. To address this challenge, we propose DPPQ, a framework based on the Determinantal Point Process (DPP) with a quality-based kernel to select clients with the most diverse datasets that achieve better performance compared to both random selection and current DPP-based client selection methods even with less data utilization in the majority of non-IID cases. This offers a significant advantage for deployment on edge devices with limited computational resources, providing a reliable solution for training AI models in massive smart sensor networks.

FedMIL: Federated-Multiple Instance Learning for Video Analysis with Optimized DPP Scheduling

TL;DR

, global aggregation

, and the selection rule

with

, enabling robust training under non-IID distributions. Experiments on the Car Crash Dataset (CCD) and MNIST show that DPPQ improves convergence and accuracy, particularly under low data utilization, making FedMIL practical for edge deployments in smart transportation and video surveillance.

Abstract

Paper Structure (19 sections, 14 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 14 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Problem Formulation
Label-based Imbalance (Type I Non-IID)
Distribution-based Imbalance (Type II Non-IID)
Algorithm Design
Federated Learning Framework
Individual Client Model and Training
Global Aggregation
Model Design for FedMIL
Client Profiling
Client Selection via Quality-Diversity Decomposition
Experiments
Car Crash Dataset (CCD) for Traffic Accident Analysis
Platform and Training Process
Evaluation on MNIST Dataset
...and 4 more sections

Figures (6)

Figure 1: The architecture of FedMIL. Compared to classical central learning, the training process is broken down to the client level to prevent massive data exchange. In advance, with a pre-trained CNN model to extract the video feature and a lightweight MIL model for bag-level classification, the computational cost for edge devices is greatly reduced.
Figure 2: Simulating real-world data distribution: (a) the data distribution of different clients is imbalanced directly at the class level; (b) data imbalance based on Dirichlet distribution over underlying data clusters, and (c) K-means clustering on VGG features to identify underlying clusters, where (i) represents the video condition under snow day, (ii) represents the night snippets, (iii) is the urban recording, and (iv) represent other scenarios.
Figure 3: Training comparison in MNIST dataset with Type I Non-IID distribution of strength $(\alpha=0.5)$ and 100% data utilization
Figure 4: Model performance comparison in terms of data imbalance based on class labels.
Figure 5: Model performance comparison in terms of data imbalance based on Dirichlet distribution.
...and 1 more figures

FedMIL: Federated-Multiple Instance Learning for Video Analysis with Optimized DPP Scheduling

TL;DR

Abstract

FedMIL: Federated-Multiple Instance Learning for Video Analysis with Optimized DPP Scheduling

Authors

TL;DR

Abstract

Table of Contents

Figures (6)