个人信息
参与实验室科研项目
人机智能协同关键技术及其在智能制造中的应用
非可信智能驱动的可靠智造
学术成果
共撰写/参与撰写专利 0 项,录用/发表论文 1 篇,投出待录用论文0篇。
Conference Articles
-
Swap Softmax Twin Delayed Deep Deterministic Policy Gradient
Chaohu Liu,
and Yunbo Zhao
In 2023 6th Int. Symp. Auton. Syst. ISAS
2023
[Abs]
[doi]
[pdf]
Reinforcement learning algorithms have achieved remarkable success in the realm of continuous control. Among the extensively used algorithms, the Deep Deterministic Policy Gradient algorithm (DDPG) is one of the classic continuous control algorithms, which is prone to the problem of overestimation. Subsequently, the Twin Delayed Deep Deterministic Policy Gradient algorithm (TD3) was proposed, which incorporated the idea of double DQN by taking the minimum value between a pair of critics in order to limit overestimation. Nevertheless, TD3 may lead to an underestimation bias. In order to reduce the effect of errors, we introduce a new method by incorporating Swap Softmax to TD3, which can offset the maximum and minimum values. We evaluate our method on continuous control tasks from OpenAI Gym simulated by MuJoCo and the results show that it has an improvement in performance and robustness.
博客文章
学位论文
毕业去向
中国科学技术大学, 博士研究生