Feature-to-Image Data Augmentation: Improving Model Feature Extraction with Cluster-Guided Synthetic Samples

Yasaman Haghbin; Hadi Moradi; Reshad Hosseini

Feature-to-Image Data Augmentation: Improving Model Feature Extraction with Cluster-Guided Synthetic Samples

Yasaman Haghbin, Hadi Moradi, Reshad Hosseini

TL;DR

This work tackles data scarcity in healthcare by introducing FICAug, a two-stage augmentation framework that operates in feature space and then reconstructs samples into the image domain for CNN training. It clusters latent features, generates class-pure synthetic samples via Gaussian sampling, and maps them back to realistic images using GANimation-based reconstruction, followed by fine-tuning on real data. On a Parkinson's disease facial-expression dataset, FICAug yields strong improvements, with image-space CNN training achieving a cross-validation of 88.63% and a test accuracy of 94.00%, outperforming several baselines. The method demonstrates that structured, cluster-aware augmentation combined with image-domain reconstruction can enhance representation learning in settings with limited labeled data and is potentially generalizable to other small-data domains.

Abstract

One of the growing trends in machine learning is the use of data generation techniques, since the performance of machine learning models is dependent on the quantity of the training dataset. However, in many real-world applications, particularly in medical and low-resource domains, collecting large datasets is challenging due to resource constraints, which leads to overfitting and poor generalization. This study introduces FICAug, a novel feature-to-image data augmentation framework designed to improve model generalization under limited data conditions by generating structured synthetic samples. FICAug first operates in the feature space, where original data are clustered using the k-means algorithm. Within pure-label clusters, synthetic data are generated through Gaussian sampling to increase diversity while maintaining label consistency. These synthetic features are then projected back into the image domain using a generative neural network, and a convolutional neural network is trained on the reconstructed images to learn enhanced representations. Experimental results demonstrate that FICAug significantly improves classification accuracy. In feature space, it achieved a cross-validation accuracy of 84.09%, while training a ResNet-18 model on the reconstructed images further boosted performance to 88.63%, illustrating the effectiveness of the proposed framework in extracting new and task-relevant features.

Feature-to-Image Data Augmentation: Improving Model Feature Extraction with Cluster-Guided Synthetic Samples

TL;DR

Abstract

Paper Structure (18 sections, 8 equations, 6 figures, 6 tables)

This paper contains 18 sections, 8 equations, 6 figures, 6 tables.

Introduction
Related Work
Methodology
Feature Extraction and Clustering
Cluster Evaluation and Re-Clustering Process
Synthetic Data Generation in Feature Space
Mapping Synthetic Data to Original Space
CNN Model Training and Fine-Tuning
Dataset
Experiments
Feature Extraction and Feature Space
Data Splitting for Training, Validation, and Testing
Statistical evaluations
Evaluation of Feature Space Augmentation Strategies
GANimation and Synthetic Face Generation
...and 3 more sections

Figures (6)

Figure 1: Overview of the FICAug framework, which generates synthetic feature vectors through clustering and Gaussian sampling, reconstructs them into images via a generative model, and trains a CNN to extract enhanced task-specific features.
Figure 2: Facial expressions from left to right: anger, happiness, disgust, fear and surprise
Figure 3: AI-generated neutral face images used for applying Action Units
Figure 4: Synthetic facial expressions generated for control individuals using GANimation
Figure 5: Synthetic facial expressions generated for Parkinson's patients using GANimation
...and 1 more figures

Feature-to-Image Data Augmentation: Improving Model Feature Extraction with Cluster-Guided Synthetic Samples

TL;DR

Abstract

Feature-to-Image Data Augmentation: Improving Model Feature Extraction with Cluster-Guided Synthetic Samples

Authors

TL;DR

Abstract

Table of Contents

Figures (6)