Enhancing Data Lake Management Systems with LDA Approach
DOI:
https://doi.org/10.47852/bonviewJDSIS42022312Keywords:
data lake management, Latent Dirichlet Allocation, data analytics, big data, topic modelingAbstract
In today's fiercely competitive business landscape, data has emerged as a precious asset crucial for any company's growth. It embodies a genuine catalyst for economic and strategic advantages, distinguishing industry leaders from the rest. Prominent organizations recognize the importance of not just amassing data from diverse sources but also harnessing the transformative power of data analytics for informed determinations processes. Within this setting, the data lake solution stands as a robust framework handling vast data sources and enabling data investigations to support decision-making tasks. This paper delves into the realm of intelligent data lake management systems designed to overcome the limitations of traditional business intelligence, which struggles to meet the demands of data-driven decision-making. Data lakes excel in the analysis of data from myriad sources, particularly when data cleaning becomes a time-consuming endeavor. Still, managing diverse datasets devoid of a predefined data structure presents a significant challenge, potentially leading to a data lake devolving to a data swamp. Within this article, we adopt the Latent Dirichlet Allocation model to oversee the data lake environment's handling, processing, analysis, and display of huge datasets. To evaluate the efficacy of our suggested approach, we conducted comprehensive assessments using the topic coherence metric. Our experiments yielded results indicating the superior accuracy of our approach when applied to the tested datasets.
Received: 16 December 2023 | Revised: 21 March 2024| Accepted: 20 May 2024
Conflicts of Interest
The authors declare that they have no conflicts of interest to this work.
Data Availability Statement
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
Author Contribution Statement
Mohamed Cherradi: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing - original draft, Writing - review & editing, Visualization, Project administration. Anass El Haddadi: Supervision.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Authors
This work is licensed under a Creative Commons Attribution 4.0 International License.