A Model-Based Reinforcement Learning Method with Conditional Variational Auto-Encoder

Authors

  • Ting Zhu School of Mathematics, Southwest Jiaotong University, China https://orcid.org/0009-0005-3587-1302
  • Ruibin Ren School of Mathematics, Southwest Jiaotong University, China
  • Yukai Li School of Mathematics, Southwest Jiaotong University, China
  • Wenbin Liu School of Mathematics, Southwest Jiaotong University and the 30th Research Institute of China Electronics Technology Group Corporation, China

DOI:

https://doi.org/10.47852/bonviewJDSIS42022432

Keywords:

model-based reinforcement learning, conditional variational auto-encoder, task-relevant representations

Abstract

Model-based reinforcement learning can effectively improve the sample efficiency of reinforcement learning, but the environment model in this method has errors. The model errors can mislead the policy optimization, leading to suboptimal policy. To improve the generalization ability of the environment model, existing methods often use ensemble models or Bayesian models to build the environment model. However, these methods are computationally intensive and complex to update. Since the generated model can describe the stochastic nature of the environment, this paper proposes a model-based reinforcement learning method based on conditional variational auto-encoder (CVAE). In this paper, we use a CVAE to learn task-related representations and apply the generative model to predict environmental changes. Considering the problem of multi-step error accumulation, model adaptation is utilized to minimize the difference between simulated and real data distributions. Furthermore, the experiments verified that the proposed method can learn task-relevant representations and accelerate policy learning.

 

Received: 5 January 2024 | Revised: 28 February 2024 | Accepted: 9 March 2024 

 

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work. 

 

Data Availability Statement

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

 

Author Contribution Statement

Ting Zhu: Conceptualization, Methodology, Software, Investigation, Writing - original draft, Writing - review & editing. Ruibin Ren: Conceptualization, Methodology, Formal analysis, Investigation, Resources, Writing - review & editing, Visualization, Supervision, Project administration. Yukai Li: Software, Validation, Formal analysis, Data curation, Writing - review & editing. Wenbin Liu: Software, Validation, Resources, Writing - review & editing, Project administration.


Downloads

Published

2024-03-13

Issue

Section

Research Articles

How to Cite

Zhu, T., Ren, R., Li, Y., & Liu, W. (2024). A Model-Based Reinforcement Learning Method with Conditional Variational Auto-Encoder. Journal of Data Science and Intelligent Systems. https://doi.org/10.47852/bonviewJDSIS42022432