Learning Humanoid Robot Push Recovery using Deep Reinforcement Learning
Dicksiano Carvalho Melo
Unknown
- 0 Collaborators
We apply the Proximal Policy Optimization algorithm in order to learn a Push Recovery strategy in a simulated humanoid robot environment. Therefore, we aim to improve the robot's walking stability. As this is a complex task, we are using the Intel DevCloud to collect experience in parallel. ...learn more
Project status: Under Development
Intel Technologies
DevCloud
Overview / Usage
We are using the Proximal Policy Optimization (PPO) algorithm in order to learn a Push Recovery strategy in a simulated humanoid robot while it is walking. The walking pattern was designed using control theory in a previous work and it is not subject to learning. In the simulation, the robot is pushed using a soccer ball and its task is to recover from these pushes. The robot is rewarded for keeping balance as long as possible. Since this is a very complex task, we are using the Intel DevCloud in order to collect a lot of experience using many agents in parallel. As the gradient estimation can be considered one of the main challenges in Policy Gradient Methods, running many agents in parallel is imperative to the success of the project.
Methodology / Approach
The main goal of this work was to learn a human-inspired behavior for a simulated humanoid robot. The technique to solve this problem was a Model-free Deep Reinforcement Learning algorithm called Proximal Policy Optimization. The choice was based on the complexity associated with the humanoid robot’s dynamics. By using the Model-free algorithm, we avoid having to model the dynamics. Also, it is a interesting choice as human beings perform Push Recovery strategies without knowing its own dynamics a priori.
In the present, Intel AI DevCloud not only was used for accelarate the trainig proccess. More than that, this tool was fundamental in the search for a good set of hyperparameters.
Technologies Used
RoboCup Soccer 3D Simulation League, TensorFlow, OpenAI Baselines.