Random Numbers for Machine Learning: A Comparative Study of Reproducibility and Energy Consumption

Authors

Benjamin Antunes Clermont Auvergne University (UCA), The National Centre for Scientific Research (CNRS), France https://orcid.org/0000-0002-0700-6558
David R. C. Hill Clermont Auvergne University (UCA), The National Centre for Scientific Research (CNRS), France https://orcid.org/0000-0003-2820-2766

DOI:

https://doi.org/10.47852/bonviewJDSIS42024012

Keywords:

reproducible research, machine learning, pseudorandom numbers, energy consumption

Abstract

Pseudo-Random Number Generators (PRNGs) have become ubiquitous in machine learning (ML) technologies because they are interesting for numerous methods. In the context of ML, multiple stochastic streams, produced in black boxes for methods such as stochastic gradient descent or dropout, can produce a lack of repeatability, impacting the ability to debug and explain results. The field of machine learning holds the potential for substantial advancements across various domains. However, despite the growing interest, persistent concerns include issues related to reproducibility and energy consumption. Reproducibility is crucial for robust scientific inquiry and explainability, while energy efficiency underscores the imperative to conserve finite global resources. This study delves into the investigation of whether the leading Pseudo-Random Number Generators (PRNGs) employed in machine learning languages, libraries, and frameworks uphold statistical quality and numerical reproducibility when compared to the original C implementation of the respective PRNG algorithms. Additionally, we aim to evaluate the time efficiency and energy consumption of various implementations. Our experiments encompass Python, NumPy, TensorFlow, and PyTorch, utilizing the Mersenne Twister, Permuted Congruential Generator (PCG), and Philox algorithms. Remarkably, we verified that the temporal performance of machine learning technologies closely aligns with that of C-based implementations, with instances of achieving even superior performances. On the other hand, it is noteworthy that ML technologies consumed only 10% more energy than their C-implementation counterparts. However, while statistical quality was found to be comparable, achieving numerical reproducibility across different platforms for identical seeds and algorithms was not achieved.

Received: 2 August 2024 | Revised: 9 October 2024 | Accepted: 11 November 2024

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work.

Data Availability Statement

The data that support the findings of this study are openly available in GitLab at https://gitlab.isima.fr/beantunes/random-numbers-in-machine-learning/.

Author Contribution Statement

Benjamin Antunes: Conceptualization, Software, Investigation, Writing - original draft, Writing - review & editing. David R. C. Hill: Validation, Writing - review & editing, Visualization, Supervision, Project administration, Funding acquisition.

Downloads

Published

2024-11-18

Issue

Online First

Section

Research Articles

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Antunes, B., & Hill, D. R. C. (2024). Random Numbers for Machine Learning: A Comparative Study of Reproducibility and Energy Consumption. Journal of Data Science and Intelligent Systems. https://doi.org/10.47852/bonviewJDSIS42024012

Download Citation

Random Numbers for Machine Learning: A Comparative Study of Reproducibility and Energy Consumption

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Journal Information

Make a Submission

Announcements

JDSIS Has Been Officially Indexed in EBSCO

STM Membership Announcement

Keywords