Activity Feed

Medium mol

Moloti N. created project Intelligent Home Security: Africa Motion Content encoder decoder using Deep Neural Networks

Medium 18657e03 c017 4a3b b485 57589b45e7a5

Intelligent Home Security: Africa Motion Content encoder decoder using Deep Neural Networks

We propose the use of Drones to help communities enhance their security initiatives, to identify criminals during the day and at night. We use multiple sensors and computer vision algorithms to be able to recognize/detect motion and content in real-time, then automatically send messages to community members cell phones about the criminal activities. Hence, community members may be able to stop house breakings before they even occur.

Machine Intelligence Algorithm Design Methodology

AMCnet: https://github.com/AfricaMachineIntelligence/AMCnet https://devmesh.intel.com/projects/africa-motion-content-network-amcnet

We propose a deep neural network for the prediction of future frames in natural video sequences using CPU. To effectively handle complex evolution of pixels in videos, we propose to decompose the motion and content, two key components generating dynamics in videos. The model is built upon the Encoder-Decoder Convolutional Neural Network and Convolutional LSTM for pixel-level prediction, which independently capture the spatial layout of an image and the corresponding temporal dynamics. By independently modeling motion and content, predicting the next frame reduces to converting the extracted content features into the next frame content by the identified motion features, which simplifies the task of prediction. The model we aim to build should be end-to-end trainable over multiple time steps, and naturally learns to decompose motion and content without separate training. We evaluate the proposed network architecture on human AVA and UCF-101 datasets. We show state-of-the art performance in comparison to recent approaches. This is an end-to-end trainable network architecture running on the CPU with motion and content separation to model the spatio-temporal dynamics for pixel-level future prediction in natural videos.

// We then use this AMCnet pretrained model on the Video feed from the DJI Spark drone, integrated with the Movidius NCS to accelerate real-time object detection neural networks.

Mol

Moloti N. added photos to project SuperAgent.ai

Medium d69cc269 8620 451d 9e76 9a97f6eb89b9

SuperAgent.ai

This is a simple Q-learning game made with UNITY 3D, this is a very crucial part of implementing machine learning algorithms in Unity.

The Algorithms we use are as follows:

Q-Learning is an Off-Policy algorithm for Temporal Difference learning. It can be proven that given sufficient training under any -soft policy, the algorithm converges with probability 1 to a close approximation of the action-value function for an arbitrary target policy.
Q-Learning Agent learns the optimal policy even when actions are selected according to a more exploratory or even random policy. The iterative algorithm for SARSA is used in this project,t he SARSA algorithm is a stochastic approximation to the Bellman equations for Markov Decision Processes.
TD learning, including SARSA and Q-Learning, uses the ideas of Dynamic Programming in a sample-based environment where the equalities are true in expectation.

TD learning, including SARSA and Q-Learning, uses the ideas of Dynamic Programming in a sample-based environment where the equalities are true in expectation. But essentially you can see how the update qπ(s,a)=∑s′,rp(s′,r|s,a)(r+γ∑a′π(a′|s′)qπ(s′,a′))qπ(s,a)=∑s′,rp(s′,r|s,a)(r+γ∑a′π(a′|s′)qπ(s′,a′)) has turned into SARSA's update:

The weighted sum over state transition and reward probabilities happens in expectation as you take many samples. So Q(S,A)=E[Sampled(R)+γ∑a′π(a′|S′)qπ(S′,a′)]Q(S,A)=E[Sampled(R)+γ∑a′π(a′|S′)qπ(S′,a′)] (technically you have to sample R and S' together) Likewise the weighting of the current policy happens in expectation. So Q(S,A)=E[Sampled(R+γQ(S′,A′))]Q(S,A)=E[Sampled(R+γQ(S′,A′))] To change this expectation into an incremental update, allowing for non-stationarity as the policy improves over time, we add a learning rate and move each estimate towards the latest sampled value: Q(S,A)=Q(S,A)+α[R+γQ(S′,A′)−Q(S,A)

The goal when doing Reinforcement Learning is to train an agent which can learn to act in ways that maximizes future expected rewards within a given environment. In the last post in this series, that environment was relatively static. The state of the environment was simply which of the three possible rooms the agent was in, and the actions were choosing which chest within that room to open. Our algorithm learned the Q-function for each of these state-action pairs: Q(s, a). This Q-function corresponded to the expected future reward that would be acquired by taking that action within that state over time.

Mol

Moloti N. added photos to project SuperAgent.ai

Medium c47c1524 4657 4ee9 8f12 db2435911412

SuperAgent.ai

This is a simple Q-learning game made with UNITY 3D, this is a very crucial part of implementing machine learning algorithms in Unity.

The Algorithms we use are as follows:

Q-Learning is an Off-Policy algorithm for Temporal Difference learning. It can be proven that given sufficient training under any -soft policy, the algorithm converges with probability 1 to a close approximation of the action-value function for an arbitrary target policy.
Q-Learning Agent learns the optimal policy even when actions are selected according to a more exploratory or even random policy. The iterative algorithm for SARSA is used in this project,t he SARSA algorithm is a stochastic approximation to the Bellman equations for Markov Decision Processes.
TD learning, including SARSA and Q-Learning, uses the ideas of Dynamic Programming in a sample-based environment where the equalities are true in expectation.

TD learning, including SARSA and Q-Learning, uses the ideas of Dynamic Programming in a sample-based environment where the equalities are true in expectation. But essentially you can see how the update qπ(s,a)=∑s′,rp(s′,r|s,a)(r+γ∑a′π(a′|s′)qπ(s′,a′))qπ(s,a)=∑s′,rp(s′,r|s,a)(r+γ∑a′π(a′|s′)qπ(s′,a′)) has turned into SARSA's update:

The weighted sum over state transition and reward probabilities happens in expectation as you take many samples. So Q(S,A)=E[Sampled(R)+γ∑a′π(a′|S′)qπ(S′,a′)]Q(S,A)=E[Sampled(R)+γ∑a′π(a′|S′)qπ(S′,a′)] (technically you have to sample R and S' together) Likewise the weighting of the current policy happens in expectation. So Q(S,A)=E[Sampled(R+γQ(S′,A′))]Q(S,A)=E[Sampled(R+γQ(S′,A′))] To change this expectation into an incremental update, allowing for non-stationarity as the policy improves over time, we add a learning rate and move each estimate towards the latest sampled value: Q(S,A)=Q(S,A)+α[R+γQ(S′,A′)−Q(S,A)

The goal when doing Reinforcement Learning is to train an agent which can learn to act in ways that maximizes future expected rewards within a given environment. In the last post in this series, that environment was relatively static. The state of the environment was simply which of the three possible rooms the agent was in, and the actions were choosing which chest within that room to open. Our algorithm learned the Q-function for each of these state-action pairs: Q(s, a). This Q-function corresponded to the expected future reward that would be acquired by taking that action within that state over time.

Medium mol

Moloti N. created project MAB.ai

Medium 581f45d8 6ae1 480f bf30 dd994bfc809f

MAB.ai

Reinforcement learning is learning what to do - how to map situations to actions - so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate reward, but also the next situation and, through that, all subsequent rewards.

The multi-armed bandit problem

Maximize the reward obtained by successively playing gamble machines (the ‘arms’ of the bandits) Invented in early 1950s by Robbins to model decision making under uncertainty when the environment is unknown The lotteries are unknown ahead of time Assumptions

Each machine 𝑖 has a different (unknown) distribution law for rewards with (unknown) expectation 𝜇𝑖: Successive plays of the same machine yeald rewards that are independent and identically distributed Independence also holds for rewards across machines Reward = random variable 𝑋𝑖,𝑛 ; 1 ≤ 𝑖 ≤ 𝐾, 𝑛 ≥ 1 𝑖 = index of the gambling machine 𝑛 = number of plays 𝜇𝑖 = expected reward of machine 𝑖. A policy, or allocation strategy, 𝐴 is an algorithm that chooses the next machine to play based on the sequence of past plays and obtained rewards. Many applications have been studied:

Clinical trials Adaptive routing in networks Advertising: what ad to put on a web-page? Economy: auctions Computation of Nash equilibria

Medium mol

Moloti N. created project SuperAgent.ai

Medium c47c1524 4657 4ee9 8f12 db2435911412

SuperAgent.ai

This is a simple Q-learning game made with UNITY 3D, this is a very crucial part of implementing machine learning algorithms in Unity.

The Algorithms we use are as follows:

Q-Learning is an Off-Policy algorithm for Temporal Difference learning. It can be proven that given sufficient training under any -soft policy, the algorithm converges with probability 1 to a close approximation of the action-value function for an arbitrary target policy. Q-Learning Agent learns the optimal policy even when actions are selected according to a more exploratory or even random policy. The iterative algorithm for SARSA is used in this project,t he SARSA algorithm is a stochastic approximation to the Bellman equations for Markov Decision Processes. TD learning, including SARSA and Q-Learning, uses the ideas of Dynamic Programming in a sample-based environment where the equalities are true in expectation.

TD learning, including SARSA and Q-Learning, uses the ideas of Dynamic Programming in a sample-based environment where the equalities are true in expectation. But essentially you can see how the update qπ(s,a)=∑s′,rp(s′,r|s,a)(r+γ∑a′π(a′|s′)qπ(s′,a′))qπ(s,a)=∑s′,rp(s′,r|s,a)(r+γ∑a′π(a′|s′)qπ(s′,a′)) has turned into SARSA's update:

The weighted sum over state transition and reward probabilities happens in expectation as you take many samples. So Q(S,A)=E[Sampled(R)+γ∑a′π(a′|S′)qπ(S′,a′)]Q(S,A)=ESampled(R)+γ∑a′π(a′|S′)qπ(S′,a′) Likewise the weighting of the current policy happens in expectation. So Q(S,A)=E[Sampled(R+γQ(S′,A′))]Q(S,A)=E[Sampled(R+γQ(S′,A′))] To change this expectation into an incremental update, allowing for non-stationarity as the policy improves over time, we add a learning rate and move each estimate towards the latest sampled value: Q(S,A)=Q(S,A)+α[R+γQ(S′,A′)−Q(S,A)

The goal when doing Reinforcement Learning is to train an agent which can learn to act in ways that maximizes future expected rewards within a given environment. In the last post in this series, that environment was relatively static. The state of the environment was simply which of the three possible rooms the agent was in, and the actions were choosing which chest within that room to open. Our algorithm learned the Q-function for each of these state-action pairs: Q(s, a). This Q-function corresponded to the expected future reward that would be acquired by taking that action within that state over time.

See More

About

Featured Projects

See All

No groups to show at the moment.

Bigger dnlx9oj
  • Projects 5
  • Followers 4

Thabo Koee

A Futurist whose enthusiastic about implementation of AI in broader scales, with the usage of robust computational processing power.

Medium android
Featured
  • Followers 2056

Android

Intel is inside more and more Android devices, and we have tools and resources to make your app d...

Medium networking
Featured
  • Followers 1942

Networking

Software-Defined Networking (SDN) and Network Functions Virtualization (NFV) are transforming the...

Bigger ppp
  • Projects 1
  • Followers 0

Purvam Pujari

Programmer | Developer Machine Learning , Artificial Intelligence Enthusiast !

Bigger dnlx9oj
  • Projects 5
  • Followers 4

Thabo Koee

A Futurist whose enthusiastic about implementation of AI in broader scales, with the usage of robust computational processing power.