Adaptive Gaussian-Based Kernel K-Means: Scalable Adaptive Kernel-Based Clustering for Big Data

Authors

  • Bakhshali Bakhtiyarov Department of Instrumentation Engineering, Azerbaijan State Oil and Industry University, Azerbaijan https://orcid.org/0009-0006-2172-4632
  • Aynur Jabiyeva Department of Instrumentation Engineering, Azerbaijan State Oil and Industry University, Azerbaijan https://orcid.org/0000-0002-0336-8586
  • Anakhanim Mutallimova Department of Instrumentation Engineering, Azerbaijan State Oil and Industry University, Azerbaijan https://orcid.org/0000-0001-8327-192X
  • Rukhsara Novruzova Department of Instrumentation Engineering, Azerbaijan State Oil and Industry University, Azerbaijan https://orcid.org/0000-0002-0959-6520
  • Mahabbat Khudaverdiyeva Department of Instrumentation Engineering, Azerbaijan State Oil and Industry University, Azerbaijan https://orcid.org/0000-0002-5090-4628

DOI:

https://doi.org/10.47852/bonviewJCCE52026511

Keywords:

big data clustering, Apache Spark, K-Means, DBSCAN, Fuzzy C-Means, GBK-Means, industrial sensor data

Abstract

In the field of big data, clustering remains a fundamental component of data mining and knowledge discovery, particularly in high-volume, heterogeneous environments. Although traditional clustering algorithms have been widely used, they often exhibit poor scalability, limited robustness, and sensitivity to noise when applied to large industrial-scale datasets. To address these limitations, this paper proposes a Spark-based clustering framework that incorporates an Adaptive Gaussian-Based Kernel K-Means (A-GBK-Means) algorithm, designed to mitigate performance degradation in the presence of noisy and nonhomogeneous data. The A-GBK-Means algorithm integrates fuzzy logic with adaptively determined kernel widths, using local density estimation, thereby enhancing both clustering accuracy and flexibility. Extensive experiments were conducted on real-world industrial sensor data as well as synthetic benchmarks, and performance was evaluated using multiple metrics, including execution time, Silhouette Score, Davies–Bouldin Index, and noise resilience. The comparative analysis demonstrates that the A-GBK-Means algorithm consistently outperforms classical approaches—namely K-Means, Density-Based Spatial Clustering of Applications with Noise, Fuzzy C-Means, and standard GBK-Means—in terms of clustering quality and computational efficiency. Furthermore, the proposed framework exhibits superior scalability, owing to its distributed architecture built on Apache Spark. This study provides practical insights into scalable and interpretable clustering methods for intelligent manufacturing and sensor-based systems and highlights the effectiveness of adaptive kernel learning in modern big data applications.

 

Received: 18 June 2025 | Revised: 8 August 2025 | Accepted: 16 September 2025

 

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work.

 

Data Availability Statement

The data that support the findings of this study are openly available in Kaggle at https://www.kaggle.com/datasets/dnkumars/industrial-equipment-monitoring-dataset. The data that support the findings of this study are openly available in Kaggle at https://www.kaggle.com/datasets/mohammedarfathr/sensor-data-from-an-industrial-machine.

 

Author Contribution Statement

Bakhshali Bakhtiyarov: Investigation, Data curation, Writing – original draft, Project administration. Aynur Jabiyeva: Conceptualization, Validation, Formal analysis. Anakhanim Mutallimova: Software, Resources. Rukhsara Novruzova: Methodology, Writing – review & editing, Supervision. Mahabbat Khudaverdiyeva: Visualization.


Downloads

Published

2025-10-16

Issue

Section

Research Articles

How to Cite

Bakhtiyarov, B., Jabiyeva, A., Mutallimova, A., Novruzova, R., & Khudaverdiyeva, M. (2025). Adaptive Gaussian-Based Kernel K-Means: Scalable Adaptive Kernel-Based Clustering for Big Data. Journal of Computational and Cognitive Engineering. https://doi.org/10.47852/bonviewJCCE52026511