Addressing Small and Imbalanced Medical Image Datasets Using Generative Models
DOI:
https://doi.org/10.47852/bonviewAIA52026661Keywords:
medical image augmentation, generative models, Progressive Growing Generative Adversarial Networks (PGGANs), Denoising Diffusion Probabilistic Models (DDPMs), synthetic data integrationAbstract
Progress in accurate medical image classification is often hampered by concerns surrounding data privacy and scarcity of data for certain medical diseases, leading to sparsity and unbalanced datasets. To address these challenges, this study uses generative models, namely, Denoising Diffusion Probabilistic Models (DDPMs) and Progressive Growing Generative Adversarial Networks (PGGANs), for dataset improvement. In this article, we propose a framework for understanding how the resultant synthetic images generated by DDPM and PGGANs affect four different models’ performance: a specially crafted Convolutional Neural Network, an untrained VGG16, a pretrained VGG16, and a pretrained ResNet50. For modeling practical constraints in real applications, experiments applied Random Sampling and Greedy K Sampling to obtain small unbalanced datasets. Synthetic image quality was also measured by applying Fréchet Inception Distance (FID), and their impact was further explored by comparing classification results with their original datasets. Experiments reveal that DDPM consistently produced images of higher realism, backed by lower FID scores, and overtakes PGGANs in augmenting classification outcomes of all investigated models and datasets. Addition of DDPM-generated images to original datasets obtained improvement of about 6% in accuracy and therefore enhanced robustness and reliability of models, specifically when datasets are unbalanced. Although Random Sampling obtained better consistency, Greedy K Sampling obtained higher variability but higher FID scores. Overall, this research identifies the potential of DDPM to effectively augment and balance sparse datasets of medical images and subsequently improve training of models and predictive outcomes.
Received: 1 July 2025 | Revised: 3 September 2025 | Accepted: 15 September 2025
Conflicts of Interest
The authors declare that they have no conflicts of interest to this work.
Data Availability Statement
The data that support the findings of this study are openly available in GitHub at https://github.com/imankhazrak/DDPM_X-Ray.
Author Contribution Statement
Iman Khazrak: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization, Project administration. Shakhnoza Takhirova: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation,Writing – original draft, Writing – review & editing, Visualization. Mostafa M. Rezaee: Validation, Investigation, Resources, Writing – review & editing. Mehrdad Yadollahi: Investigation, Resources, Writing – review & editing. Robert C. Green II: Methodology, Writing – review & editing, Supervision. Shuteng Niu: Methodology, Writing – review & editing, Supervision.
Metrics
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.