个人信息
参与实验室科研项目
人机自主性及其控制权切换策略研究
复杂环境下非完全信息博弈决策的智能基础模型研究
研究课题
研究用强化学习实现人机混合智能系统的共享控制及控制权的分配与切换
学术成果
共撰写/参与撰写专利 1 项,录用/发表论文 4 篇,投出待录用论文0篇。 联培学生可能有其他不在此展示的论文/专利。
patent
-
一种基于深度强化学习的共享自主方法
康宇,
游诗艺,
赵云波,
and 吕文君
[Abs]
本发明公开了一种基于深度强化学习的共 享自 主方法 ,属于人机混合 智能 系统领域。该方 法包括以下步骤:利用长短时记忆网络(LSTM)推 断人类的意图 ;利用深度强化学习训练一个从系 统状态和人类行为到行为奖赏值的端到端映射, 计算每个行为的奖赏值,代表该行为给当前任务 带来的收益;人类行为在其奖赏值下降一定程度 后被判定为无效 ,当人类行为连续多次无效时 , 系统将由机器单独控制,完成由有效行为推断出 的目 标 ,防 止无效行为造成的 危害 ;若人类行为 有效,则利用仲裁函数根据人的行为和机器的计 算结果进行共享控 制 ;根 据上述所有步骤 ,建立 基于深度强化学习的受制于无效人类行为的共 享自治算法。
Conference Articles
-
Shared Autonomy Based on Human-in-the-loop Reinforcement Learning with Policy Constraints
Ming Li,
Yu Kang,
Yun-Bo Zhao ,
Jin Zhu,
and Shiyi You
In 2022 41st Chin. Control Conf. CCC
2022
[Abs]
[doi]
[pdf]
In shared autonomous systems, humans and agents cooperate to complete tasks. Since reinforcement learning enables agents to train good policies through trial and error without knowing the dynamic model of the environment, it has been well applied in shared autonomous systems. After inferring the target from human inputs, agents trained by RL can accurately act accordingly. However, existing methods of this kind have big problems: the training of reinforcement learning algorithms require lots of exploration, which is time-consuming, lack of security guarantee and likely to cause great losses in the training process. Moreover, most of shared control methods are human-oriented, and do not consider the situation that humans may make wrong actions. In view of the above problems, this paper proposes human-in-the-loop reinforcement learning with policy constraints. In the training process, human prior knowledge is used to constrain the exploration of agents to achieve fast and efficient learning. In the process of testing we incorporate policy constraints in the arbitration to avoid serious consequences caused by human mistakes.
-
Adaptive Arbitration for Minimal Intervention Shared Control via Deep Reinforcement Learning
Shiyi You,
Yu Kang,
Yun-Bo Zhao ,
and Qianqian Zhang
In 2021 China Autom. Congr. CAC
2021
[Abs]
[doi]
[pdf]
In shared control, humans and intelligent robots jointly complete real-time control tasks with their complementary capabilities for improved performance unavailable by neither side on its own, which is attracting more and more attentions in recent years. Arbitration, as an indispensable part of shared control, determines how control authority is allocated between the human and robot, and the definition of that policy has always been one of the fundamental problems. In this paper, we propose an adaptive arbitration method for shared control systems, which minimizes the deviation from the human inputs while ensuring the system performance based on deep reinforcement learning. We provide humans the maximum assistance with the minimal intervention, in order to balance human’s need for control authority and need for performance. We apply our method to real-time control tasks, and the results show that our method achieves high task success rate and shorter task completion time with less human workload, while maintaining higher human satisfaction.
Journal Articles
-
非全时有效人类决策下的人机共享自主方法
游诗艺,
康宇,
赵云波,
and 张倩倩
中国科学:信息科学
2022
[Abs]
[doi]
[pdf]
在人机共享自主中, 人和智能机器以互补的能力共同完成实时控制任务, 以实现双方单独控制 无法达到的性能. 现有的许多人机共享自主方法倾向于假设人的决策始终“有效”, 即这些决策促进了 任务的完成, 且有效地反映了人类的真实意图. 然而, 在现实中, 由于疲劳、分心等多种原因, 人的决 策会在一定程度上“无效”, 不满足这些方法的基本假设, 导致方法失效, 进而导致任务失败. 在本文 中, 我们提出了一种新的基于深度强化学习的人机共享自主方法, 使系统能够在人类决策长期无效的情况下完成正确的目标. 具体来说, 我们使用深度强化学习训练从系统状态和人类决策到决策价值的 端到端映射, 以显式判断人类决策是否无效. 如果无效, 机器将接管系统以获得更好的性能. 我们将该 方法应用于实时控制任务中, 结果表明该方法能够及时、准确地判断人类决策的有效性, 分配相应的 控制权限, 并最终提高了系统性能.
-
Traded Control of Human–Machine Systems for Sequential Decision-Making Based on Reinforcement Learning
Qianqian Zhang,
Yu Kang,
Yun-Bo Zhao ,
Pengfei Li,
and Shiyi You
IEEE Trans. Artif. Intell.
2022
[Abs]
[doi]
[pdf]
Sequential decision-making (SDM) is a common type of decision-making problem with sequential and multistage characteristics. Among them, the learning and updating of policy are the main challenges in solving SDM problems. Unlike previous machine autonomy driven by artificial intelligence alone, we improve the control performance of SDM tasks by combining human intelligence and machine intelligence. Specifically, this article presents a paradigm of a human–machine traded control systems based on reinforcement learning methods to optimize the solution process of sequential decision problems. By designing the idea of autonomous boundary and credibility assessment, we enable humans and machines at the decision-making level of the systems to collaborate more effectively. And the arbitration in the humanmachine traded control systems introduces the Bayesian neural network and the dropout mechanism to consider the uncertainty and security constraints. Finally, experiments involving machine traded control, human traded control were implemented. The preliminary experimental results of this article show that our traded control method improves decision-making performance and verifies the effectiveness for SDM problems.
博客文章
学位论文
Theses
-
基于人类决策有效性的人机混合决策方法研究
游诗艺
中国科学技术大学, 合肥
2022
[Abs]
[pdf]
随着人工智能技术的发展,机器的自主能力不断地提高,智能机器在各行各 业的应用和发展日益深入。在此进程中,不可避免地会遇到智能机器无法应对 实际任务的复杂性和不可预测性的情况,许多系统在未来仍将需要人类在监督、 目标设定、应急响应等方面与机器进行持续、密切的交互,研究此种场景下如何 混合人类决策和机器决策以达到更好的决策效果也因此尤为重要和有意义。 在人机混合决策中,人类决策是否有效,即人的决策是否促进任务的完成并 有效地反映人类的真实意图,从两方面影响着最终的决策性能。一方面在于一方 决策失效将导致混合性能的下降;另一方面在于智能机器常常无法直接得知人 的意图,而需先根据人类决策推测意图,再做出决策辅助人完成该意图,人类决 策的失效可能导致意图推理的失效,进而导致人机混合决策方法的失效和任务 失败。因此本文以人机混合决策方法为研究对象,基于人类决策的有效性,从人 类决策全时有效和人类决策非全时有效两个方面展开研究,提出基于强化学习 的人机混合决策方法来改善决策性能。本文的研究工作主要包括以下两个方面: (1)针对人类决策全时有效的情况,提出一种基于最小干预原则的人机混合 决策方法,在优化整体系统性能的基础上,进一步考虑人对于人机系统满意度的 相关指标。通过将最小干预原则引入人机混合决策,设置人机决策融合的自适应 阈值,该方法能够以最小程度的干预为人类提供最大程度的帮助,并能在实时变 化的环境中保持最优,同时提升和改善系统性能和人类满意度两类指标,为后续 的优化设计方案提供基础性方法。 (2)针对人类决策非全时有效的情况,提出一种基于人类决策有效性评估机 制的人机混合决策方法,以避免人的无效决策损害系统性能。通过利用强化学习 算法判断人类决策的有效性,识别人的意图是否改变,该方法能够在人类决策无 效时由机器单独完成任务,使得系统在人类决策非全时有效的情况下,仍能完成 正确的任务目标,有效提升了人机混合决策质量和系统性能。
毕业去向
华泰证券股份有限公司, 算法工程师