Leveraging a Hybrid Fuzzy C-Means-PCA Model for Identifying Speech Therapy Needs in Children

Authors

  • Muhammad Rizal Haris Department of Informatics, Universitas Muhammadiyah Makassar, Indonesia https://orcid.org/0009-0004-2748-8509
  • Muhammad Faisal Department of Informatics, Universitas Muhammadiyah Makassar, Indonesia https://orcid.org/0000-0003-1469-9468
  • Rizki Yusliana Bakti Department of Informatics, Universitas Muhammadiyah Makassar, Indonesia
  • Muhyiddin A. M Hayat Department of Informatics, Universitas Muhammadiyah Makassar, Indonesia
  • Titik Khawa Abd Rahman School of Science and Technology, Asia e University, Malaysia
  • Muhammad Syafaat S. Kuba Department of Water Resources Engineering, Universitas Muhammadiyah Makassar, Indonesia
  • Titin Wahyuni Department of Informatics, Universitas Muhammadiyah Makassar, Indonesia

DOI:

https://doi.org/10.47852/bonviewJDSIS62028311

Keywords:

Fuzzy C-Means, principal component analysis, acoustic–prosodic feature, speech therapy, developmental language disorder

Abstract

This study proposes a hybrid unsupervised learning framework that integrates principal component analysis (PCA) and Fuzzy C-Means (FCM) to support early identification of speech therapy needs in children. The model utilizes secondary structured datasets containing acoustic–prosodic features such as MFCC, jitter, shimmer, harmonic-to-noise ratio, pitch, duration, fluency, and temporal indicators. PCA was applied to reduce dimensional redundancy and address the high-dimensional, low-sample-size characteristics of pediatric speech data, producing four principal components that retained 83.27% of the total variance. These components were subsequently clustered using FCM to capture partial membership patterns that reflect the continuous nature of children's speech deviations. The proposed PCA-FCM model achieved the best cluster compactness and separation, with a 25.6% improvement in the XieBeni (XB) index compared to the baseline FCM model (XB = 0.421). Three interpretable clusters including Normal, Mild Deviation, and Severe Deviation were identified, each associated with distinct acoustic–prosodic profiles. These findings demonstrate the potential of hybrid unsupervised learning to provide an objective, interpretable, and efficient early-screening mechanism for guiding personalized speech therapy interventions in children.

 

Received: 19 November 2025 | Revised: 16 April 2026 | Accepted: 11 May 2026

 

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work.

 

Data Availability Statement

The analytical dataset supporting this study is publicly available in Figshare at https://doi.org/10.6084/M9.FIGSHARE.31410669.

 

Author Contribution Statement

Muhammad Rizal Haris: Conceptualization, Methodology, Validation, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization. Muhammad Faisal: Conceptualization, Methodology, Software, Formal analysis, Investigation, Resources, Data curation, Writing – review & editing, Supervision. Rizki Yusliana Bakti: Methodology, Software, Validation, Data curation, Project administration. Muhyiddin AM Hayat: Validation, Formal analysis, Investigation, Resources, Project administration. Titik Khawa Abd Rahman: Conceptualization, Methodology, Software, Validation, Writing – review & editing, Project administration. Muhammad Syafaat S. Kuba: Validation, Writing – review & editing, Supervision. Titin Wahyuni: Data curation, Software, Project administration.

Downloads

Published

2026-06-05

Issue

Section

Research Articles

How to Cite

Haris, M. R., Faisal, M., Bakti, R. Y., Hayat, M. A. M., Rahman, T. K. A., Syafaat S. Kuba, M., & Wahyuni, T. (2026). Leveraging a Hybrid Fuzzy C-Means-PCA Model for Identifying Speech Therapy Needs in Children. Journal of Data Science and Intelligent Systems. https://doi.org/10.47852/bonviewJDSIS62028311