Towards Predicting Recurrence Risk of Differentiated Thyroid Cancer with a Hybrid Machine Learning Model
DOI:
https://doi.org/10.47852/bonviewMEDIN42024441Keywords:
differentiate thyroid cancer, agglomerative clustering algorithm, multiple linear regressions, recurrence predictions, hybrid machine learning modelAbstract
Thyroid cancers (TC) often recur. This paper tests a novel unsupervised and supervised hybrid machine learning model to predict the recurrence risk (RR) and its score (RRS) in a population of "differentiated thyroid cancer" (DTC) cases as the prognostic measure. The DTC data (383 × 13) are collected from the UCI library. The population is grouped into "high risk of recurrence" (HRR) and "no high risk of recurrence" (NHRR) using the agglomerative clustering algorithm (ACA). Prior, the dataset is log-normalized [0,1], column-wise as a preprocessing step. Log-normalized values of the predictors, their corresponding coefficients, and the constant/intercept are used to construct a multiple linear regression to compute the RRS. Further, RRS values are normalized [0,1] using a log-sigmoidal function and termed "RRS_norm". RRS_norms closer to the average RRS_norms of HRR and NHRR determine the predicted group. The model’s performance is measured with a confusion matrix, and RRS_norm results are matched with the RR labeled within the dataset. The result shows that ACA can correctly cluster the dataset into HRR and NHRR by 63.4%. Based on the coefficient values, predictors such as "Age", "Gender", "Smoking", "History of smoking", "History of Radiotherapy", "Adenopathy", and "Tumor staging" which comprise 53.84% of the total number of predictors show a positive correlation with "recurrence". However, while matching the RRS_norms with the actual RRs, a 21.68% mismatch is observed, which mandates investigations with other DTC datasets.
Received: 26 September 2024 | Revised: 18 November 2024 | Accepted: 29 November 2024
Conflicts of Interest
The author declares that he has no conflicts of interest to this work.
Data Availability Statement
The data that support the findings of this study are openly available in Kaggle at https://www.kaggle.com/datasets/joebeachcapital/differentiated-thyroid-cancer-recurrence, reference number [14].
Author Contribution Statement
Subhagata Chattopadhyay: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization, Supervision, Project administration.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Author
This work is licensed under a Creative Commons Attribution 4.0 International License.