Table of Contents
Fetching ...

Studying Various Activation Functions and Non-IID Data for Machine Learning Model Robustness

Long Dang, Thushari Hapuarachchi, Kaiqi Xiong, Jing Lin

TL;DR

The paper tackles robustness of deep learning models under adversarial attacks by extending adversarial training to ten activation functions in centralized settings and to federated learning under IID and non-IID data. It introduces an advanced centralized AT framework that incorporates architecture adjustments, soft labeling, simplified data augmentation, and learning-rate variation, then transfers this approach to FL with a data-sharing strategy to counter non-IID challenges. Empirical results show ReLU generally delivers the best balance of natural and robust accuracy in centralized AT, while data sharing substantially improves FL robustness on non-IID data and outperforms CalFAT. Overall, the work offers a scalable pathway to robust federated learning through data sharing and activation-function-aware AT, with implications for safety-critical and distributed AI deployments.

Abstract

Adversarial training is an effective method to improve the machine learning (ML) model robustness. Most existing studies typically consider the Rectified linear unit (ReLU) activation function and centralized training environments. In this paper, we study the ML model robustness using ten different activation functions through adversarial training in centralized environments and explore the ML model robustness in federal learning environments. In the centralized environment, we first propose an advanced adversarial training approach to improving the ML model robustness by incorporating model architecture change, soft labeling, simplified data augmentation, and varying learning rates. Then, we conduct extensive experiments on ten well-known activation functions in addition to ReLU to better understand how they impact the ML model robustness. Furthermore, we extend the proposed adversarial training approach to the federal learning environment, where both independent and identically distributed (IID) and non-IID data settings are considered. Our proposed centralized adversarial training approach achieves a natural and robust accuracy of 77.08% and 67.96%, respectively on CIFAR-10 against the fast gradient sign attacks. Experiments on ten activation functions reveal ReLU usually performs best. In the federated learning environment, however, the robust accuracy decreases significantly, especially on non-IID data. To address the significant performance drop in the non-IID data case, we introduce data sharing and achieve the natural and robust accuracy of 70.09% and 54.79%, respectively, surpassing the CalFAT algorithm, when 40% data sharing is used. That is, a proper percentage of data sharing can significantly improve the ML model robustness, which is useful to some real-world applications.

Studying Various Activation Functions and Non-IID Data for Machine Learning Model Robustness

TL;DR

The paper tackles robustness of deep learning models under adversarial attacks by extending adversarial training to ten activation functions in centralized settings and to federated learning under IID and non-IID data. It introduces an advanced centralized AT framework that incorporates architecture adjustments, soft labeling, simplified data augmentation, and learning-rate variation, then transfers this approach to FL with a data-sharing strategy to counter non-IID challenges. Empirical results show ReLU generally delivers the best balance of natural and robust accuracy in centralized AT, while data sharing substantially improves FL robustness on non-IID data and outperforms CalFAT. Overall, the work offers a scalable pathway to robust federated learning through data sharing and activation-function-aware AT, with implications for safety-critical and distributed AI deployments.

Abstract

Adversarial training is an effective method to improve the machine learning (ML) model robustness. Most existing studies typically consider the Rectified linear unit (ReLU) activation function and centralized training environments. In this paper, we study the ML model robustness using ten different activation functions through adversarial training in centralized environments and explore the ML model robustness in federal learning environments. In the centralized environment, we first propose an advanced adversarial training approach to improving the ML model robustness by incorporating model architecture change, soft labeling, simplified data augmentation, and varying learning rates. Then, we conduct extensive experiments on ten well-known activation functions in addition to ReLU to better understand how they impact the ML model robustness. Furthermore, we extend the proposed adversarial training approach to the federal learning environment, where both independent and identically distributed (IID) and non-IID data settings are considered. Our proposed centralized adversarial training approach achieves a natural and robust accuracy of 77.08% and 67.96%, respectively on CIFAR-10 against the fast gradient sign attacks. Experiments on ten activation functions reveal ReLU usually performs best. In the federated learning environment, however, the robust accuracy decreases significantly, especially on non-IID data. To address the significant performance drop in the non-IID data case, we introduce data sharing and achieve the natural and robust accuracy of 70.09% and 54.79%, respectively, surpassing the CalFAT algorithm, when 40% data sharing is used. That is, a proper percentage of data sharing can significantly improve the ML model robustness, which is useful to some real-world applications.

Paper Structure

This paper contains 33 sections, 21 equations, 8 figures, 12 tables, 1 algorithm.

Figures (8)

  • Figure 1: Illustration of $L_{\infty}$ PGD based adversarial images using a perturbation magnitude $\epsilon=\frac{8}{255}$. The first row: base images were classified correctly by a ResNet-18 network He2015ResNet. The second row: adversarial images generated by the PGD attacks were misclassified by the network.
  • Figure 2: Model Training in FL
  • Figure 3: Federated AT procedure.
  • Figure 4: Augmenting training set for AT by each client. This figure details the data augmentation process performed by each client in the federated AT method.
  • Figure 5: An overview of the modified ResNet-18 architecture. The first convolutional layer was adjusted to use a $3\times3$ kernel in place of the natural $7\times7$. The residual blocks He2015ResNet do not include the down-sampling operation (red) to maintain feature map dimensions.
  • ...and 3 more figures