Enhancing Data Lake Management Systems with LDA Approach

Authors

  • Mohamed Cherradi Data Science and Competitive Intelligence Team, Abdelmalek Essaadi University, Morocco
  • Anass El Haddadi Data Science and Competitive Intelligence Team, Abdelmalek Essaadi University, Morocco

DOI:

https://doi.org/10.47852/bonviewJDSIS42022312

Keywords:

data lake management, Latent Dirichlet Allocation, data analytics, big data, topic modeling

Abstract

In today's fiercely competitive business landscape, data has emerged as a precious asset crucial for any company's growth. It embodies a genuine catalyst for economic and strategic advantages, distinguishing industry leaders from the rest. Prominent organizations recognize the importance of not just amassing data from diverse sources but also harnessing the transformative power of data analytics for informed determinations processes. Within this setting, the data lake solution stands as a robust framework handling vast data sources and enabling data investigations to support decision-making tasks. This paper delves into the realm of intelligent data lake management systems designed to overcome the limitations of traditional business intelligence, which struggles to meet the demands of data-driven decision-making. Data lakes excel in the analysis of data from myriad sources, particularly when data cleaning becomes a time-consuming endeavor. Still, managing diverse datasets devoid of a predefined data structure presents a significant challenge, potentially leading to a data lake devolving to a data swamp. Within this article, we adopt the Latent Dirichlet Allocation model to oversee the data lake environment's handling, processing, analysis, and display of huge datasets. To evaluate the efficacy of our suggested approach, we conducted comprehensive assessments using the topic coherence metric. Our experiments yielded results indicating the superior accuracy of our approach when applied to the tested datasets.

 

Received: 16 December 2023 | Revised: 21 March 2024| Accepted: 20 May 2024

 

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work.

 

Data Availability Statement

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

 

Author Contribution Statement

Mohamed Cherradi: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing - original draft, Writing - review & editing, Visualization, Project administration. Anass El Haddadi: Supervision.


Downloads

Published

2024-05-27

Issue

Section

Research Articles

How to Cite

Cherradi, M., & El Haddadi, A. (2024). Enhancing Data Lake Management Systems with LDA Approach. Journal of Data Science and Intelligent Systems. https://doi.org/10.47852/bonviewJDSIS42022312