演講資訊

專題研討(108/9/20)-魏廷翰 博士

題目:Reinforcement Learning for Parameterized Action Spaces

主講人:魏廷翰 博士

摘要:Within the field of machine learning, reinforcement learning is often listed with supervised learning and unsupervised learning as the three main learning methods. Reinforcement learning is different from the latter two methods in that the AI agent collects the data samples required for training by directly interacting with the environment. More specifically, the agent tries to maximize the expected total reward it can obtain through a series of actions on the environment.

Of these chosen actions, a parameterized action space is one in which the agent needs to decide on a discrete class of action first, then provide a continuous parameter for the discrete action. For example, when playing soccer, the agent may need to decide on whether to kick the ball or run. For the former (kicking), you may need continuous values for the direction angle and the kick force; for the latter (running), you may need to specify both the direction and the speed.

In this talk, I will give a brief overview of reinforcement learning, the policy gradient algorithm, and my recent work on applying reinforcement learning to robotic grasping and pushing tasks, which are parameterized action problems. The proposed algorithm is called the Parameterized Proximal Policy Optimization (P3O), and is a work in progress that was presented at the Infer2Control Workshop at NeurIPS 2018. With P3O, using just the visual inputs, we were able to successfully complete pushing tasks in 99.50% of the trials, and grasping tasks with 98.25%.

時間:108年09月20日(星期一14:30 - 14:50)

地點:臺北大學三峽校區電機資訊學院 電101教室