Deep RL on Google Football
Vishal Bidawatka
Hyderabad, Telangana
- 0 Collaborators
We set up google football environment for testing A3C RL algorithm for the same. For the implementation of RL algorithms, we have used ChainerRL library given it contains an optimized version of A3C. We have used one file namely a3c.py which contains the code for training the agent. ...learn more
Project status: Concept
Intel Technologies
AI DevCloud / Xeon
Overview / Usage
A3C has the advantage of multiple workers with each separately working on its own environment and taking actions which are completely independent of each other. The following steps shall summarize its working in easy and laymen terms and bring out its advantages over other RL algorithms-
- Each worker works independently in their own environment.
- More exploration takes place with each worker running in parallel space.
- Each worker at the end of each episode gives out tuple of information containing- [current state, next state, the action is taken, reward obtained, Done( Boolean value telling whether the episode ended or not].
- These tuples from each worker are couples together in a global buffer
- The global agent then trains on this global buffer and saves its weights.
- The workers then load on the saved weights of the global agent.
- The workers then take actions based on the trained weights of the Global Agent.
- The same steps repeat till the global agent converges.
- Faster training since workers running in parallel.
Advantage function-
Q values can be broken down into two segments -
- The State Value function V(s)
- The Advantage value A(s, a)
Advantage functions can be derived as follows-
Q(s, a)= V(s)+ A(s,a)
A(s,a) =Q(s,a) -V(s)
A(s,a)= r+ γV(s_cap) -V(s)
Advantage function actually helps us better depict how an action is compared to the others at a given state while the value function captures how good it is to be at this state.
Methodology / Approach
For the implementation of RL algorithms, we have used ChainerRL library given it contains an optimized version of A3C. We have used one file namely a3c.py which contains the code for training the agent. We trained our agent on Intel AI Dev cloud resources.
Technologies Used
ChainerRL
Intel AI Dev Cloud
Python
Repository
https://github.com/Ujwal2910/Deep-RL-on-Gfootabll-Google-football-OpenAI-style-environment