Adaptive Fault-Tolerant Scheduling in Dynamic Clouds

Authors

DOI:

https://doi.org/10.47852/bonviewJCCE52026266

Keywords:

fault tolerance, real-time cloud computing, hybrid reinforcement learning, Double Deep Q-Learning (DDQL), Proximal Policy Optimization (PPO), Soft Actor-Critic (SAC)

Abstract

The transition from product-centric to service-driven paradigms in cloud computing has enabled the on-demand provisioning of scalable resources. However, the extensive operational complexity and interdependent infrastructure of cloud environments make them highly susceptible to failures, posing significant challenges to ensuring robust fault tolerance in dynamic and distributed systems. This study introduces a novel Hybrid Value-Policy Double Deep Q-Learning (DDQL) Scheduling approach, integrating value-based DDQL with policy-based reinforcement learning techniques, which are Proximal Policy Optimization and Soft Actor-Critic, to enhance fault-tolerant scheduling dynamically. Combining the advantages of both learning paradigms, the proposed method improves decision-making efficiency, enhances system resilience against failures, and optimizes scheduling performance in real-time cloud environments. Improved mean time between failures, mean time to recovery, failure recovery time, and fault tolerance success rate are guaranteed by the Hybrid DDQL scheduler's dynamic adjustments to scheduling and resource allocation algorithms. Also, it reduces the ratio of task deadline miss, energy consumption, and response time and maximizes the throughput. These results confirm the flexibility and strength of the model in dynamic and unreliable real-time cloud computing systems.

 

Received: 27 May 2025 | Revised: 30 September 2025 | Accepted: 16 October 2025

 

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work.

 

Data Availability Statement

Data are available from the corresponding author upon reasonable request.

 

Author Contribution Statement

Chetankumar Kalaskar: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization, Supervision, Project administration. Thangam Somasundaram: Methodology, Validation, Formal analysis, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization, Supervision, Project administration, Funding acquisition.


Metrics

Metrics Loading ...

Downloads

Published

2025-12-12

Issue

Section

Research Articles

How to Cite

Kalaskar, C., & Somasundaram, T. (2025). Adaptive Fault-Tolerant Scheduling in Dynamic Clouds. Journal of Computational and Cognitive Engineering. https://doi.org/10.47852/bonviewJCCE52026266