统计与数据科学系系列学术报告之三百七十八

时间：4月26日（周二） 14:00-15:30

地点：腾讯会议号：739 629 841，密码：123456

主持人： 沈娟副教授

题目：Non-crossing quantile regression for deep reinforcement learning

报告人：冯兴东教授上海财经大学

简介：冯兴东教授，上海财经大学统计与管理学院院长、教授、博士生导师。冯兴东教授博士毕业于美国伊利诺伊大学香槟分校，研究领域包含：分位数回归理论及其应用，分布式统计计算方法，矩阵数据降维理论与算法，强化学习等，在国际顶级统计学期刊JASA、AoS、JRSSB、Biometrika以及人工智能顶会NeurIPS上发表论文多篇。冯兴东教授为国际统计学会当选会员，全国青年统计学家协会副会长，全国统计教材编审委员会委员，国务院学科评议组成员。

摘要：Distributional reinforcement learning (DRL) estimates the distribution over future returns instead of the mean to more efficiently capture the intrinsic uncertainty of MDPs. However, batch-based DRL algorithms cannot guarantee the non-decreasing property of learned quantile curves especially at the early training stage, leading to abnormal distribution estimates and reduced model interpretability. To address these issues, we introduce a general DRL framework by using non-crossing quantile regression to ensure the monotonicity constraint within each sampled batch, which can be incorporated with some well-known DRL algorithm. We demonstrate the validity of our method from both the theory and model implementation perspectives. Experiments on Atari 2600 Games show that some state-of-art DRL algorithms with the non-crossing modification can significantly outperform their baselines in terms of faster convergence speeds and better testing performance. In particular, our method can effectively recover the distribution information and thus dramatically increase the exploration efficiency when the reward space is extremely sparse.