Integrating Data Analysis Methods with Machine Learning Algorithms for Mixed Data Types: Does This Combination Improve Predictive Models' Accuracy?
DOI:
https://doi.org/10.47852/bonviewJDSIS52023906Keywords:
multivariate data, principal component analysis, categorical principal component analysis, machine learning algorithms, SVC, random forest classifier, multinomial logistic regressionAbstract
In this study, we examined the potential of integrating multivariate data analysis methods as a preliminary stage for machine learning techniques to augment their predictive power. These methods encompass principal component analysis, multiple correspondence analysis, and non-linear categorical principal component analysis with optimal scaling. The machine learning approaches evaluated include Support Vector Machines, Stochastic Gradient Descent, Naïve Bayes, K-Nearest Neighbor, Decision Trees, Random Forests, Adaptive Boosting, and Multinomial Logistic Regression. We conducted experiments using data from a nationwide survey, comprising a total sample of 42,593 adolescents who answered more than 155 questions related to their eating habits. The dependent variable, body mass index (BMI), was measured and employed in the analysis as both a quantitative and qualitative variable. The index values were initially classified based on the World Health Organization’s recommendations. The results indicated that predictions are more reliable when utilizing the BMI as a qualitative variable within a four-class structure. Implementing a multivariate data analysis strategy before applying machine learning algorithms not only conserves time but also facilitates the selection of the most effective predictive model. Although dimensionality reduction may not consistently enhance the models’ predictive abilities, it contributes to the "interpretability" of the results.
Received: 22 July 2024 | Revised: 23 September 2024 | Accepted: 2 January 2025
Conflicts of Interest
The authors declare that they have no conflicts of interest to this work.
Data Availability Statement
Data available on request from the corresponding author upon reasonable request.
Author Contribution Statement
Nikolaos Papafilippou: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Writing – review & editing, Visualization. Zacharenia Kyrana: Resources. Emmanouil Pratsinakis: Conceptualization. Efstratios Kiranas: Conceptualization, Resources. Alexandra-Maria Michailidou: Resources. Angelos Markos: Conceptualization, Methodology, Software, Formal analysis, George Menexes: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing – review & editing, Supervision, Project administration.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.