A Model-Based Reinforcement Learning Method with Conditional Variational Auto-Encoder

Ting Zhu; Ruibin Ren; Yukai Li; Wenbin Liu

doi:10.47852/bonviewJDSIS42022432

Authors

Ting Zhu School of Mathematics, Southwest Jiaotong University, China https://orcid.org/0009-0005-3587-1302
Ruibin Ren School of Mathematics, Southwest Jiaotong University, China
Yukai Li School of Mathematics, Southwest Jiaotong University, China
Wenbin Liu School of Mathematics, Southwest Jiaotong University and the 30th Research Institute of China Electronics Technology Group Corporation, China

DOI:

https://doi.org/10.47852/bonviewJDSIS42022432

Keywords:

model-based reinforcement learning, conditional variational auto-encoder, task-relevant representations

Abstract

Model-based reinforcement learning can effectively improve the sample efficiency of reinforcement learning, but the environment model in this method has errors. The model errors can mislead the policy optimization, leading to suboptimal policy. To improve the generalization ability of the environment model, existing methods often use ensemble models or Bayesian models to build the environment model. However, these methods are computationally intensive and complex to update. Since the generated model can describe the stochastic nature of the environment, this paper proposes a model-based reinforcement learning method based on conditional variational auto-encoder (CVAE). In this paper, we use a CVAE to learn task-related representations and apply the generative model to predict environmental changes. Considering the problem of multi-step error accumulation, model adaptation is utilized to minimize the difference between simulated and real data distributions. Furthermore, the experiments verified that the proposed method can learn task-relevant representations and accelerate policy learning.

Received: 5 January 2024 | Revised: 28 February 2024 | Accepted: 9 March 2024

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work.

Data Availability Statement

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

Author Contribution Statement

Ting Zhu: Conceptualization, Methodology, Software, Investigation, Writing - original draft, Writing - review & editing. Ruibin Ren: Conceptualization, Methodology, Formal analysis, Investigation, Resources, Writing - review & editing, Visualization, Supervision, Project administration. Yukai Li: Software, Validation, Formal analysis, Data curation, Writing - review & editing. Wenbin Liu: Software, Validation, Resources, Writing - review & editing, Project administration.

A Model-Based Reinforcement Learning Method with Conditional Variational Auto-Encoder

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Journal Information

Make a Submission

Announcements

JDSIS Has Been Officially Indexed in EBSCO

STM Membership Announcement

Keywords