Auxiliary combat system based on the detection of human skeleton key points

Dong Li

Dong Li

Beijing Shi

2 0
  • 0 Collaborators

We calculate the 3D coordinates of some key points (such as fist) of the opponent by detecting the key points of human skeleton, so as to calculate the speed and movement direction of the key points, and guide the user to make the decision of dodge or block. ...learn more

Project status: Under Development

Artificial Intelligence

Intel Technologies
Movidius NCS, OpenVINO, Intel FPGA

Overview / Usage

In competitive sports with fierce physical confrontation, such as basketball, football, mixed martial arts (MMA) and so on, it has a crucial influence on the result of the game to judge the opponent's action and make appropriate response quickly. For example, the speed and direction of the opponent's punches are analyzed in a boxing match, and then the athletes are determined to block, dodge or counterattack. In football games, the goalkeeper determines how to save by judging the direction of movement of the legs and the position of the ball when the opponent players shoot. Usually, these decisions are based on the personal experience of the athlete. However, due to the limitation of the athlete's own level and the influence of factors such as physical energy consumption and opponents' fake actions, it is difficult for the athlete to make optimal decisions.

Thanks to the increase of computing power, the accumulation of massive data, and the progress and optimization of algorithms, deep learning and computer vision technology have flourished. The advantages of computer vision technology in recognition accuracy and real-time nature make it widely used in commercial, transportation, medical and other fields. Although there is relatively little research work in sports, especially competitive sports with fierce confrontation , computer vision technology has broad development space in digital sports.

Taking the MMA as an example, we could analyze the speed and strength of the opponent's fist to determine whether the attack is a fake action through computer vision technology. If it is a fake action, you can ignore it and continue to attack the opponent according to your own plan. If it is judged as an effective attack, the position and probability of being hit can be predicted by the opponent's punching position. If the opponent punches slowly, we can dodge. And if the opponent punches too fast to dodge, we can block.

Computer vision can also be used in football, basketball and other sports. In the future, the ultimate goal of our project is to apply it to combat robots which could defeat the enemy by analyzing the attack mode of the opponent.

Methodology / Approach

1 Detection of key points of human skeleton

We recognize and analyze the posture and movement of human body by detecting the key points of human skeleton. It mainly detects the face, neck, shoulder joint, elbow joint, wrist joint, knee joint and other key points. Through the analysis of two-dimensional image, the coordinates of key points are determined.

Most of the traditional algorithms for detecting the key points of human bones are based on the geometric priori by using the template matching method. The most classical algorithm is the pictorial structure, which is the spring deformation model. The spring deformation model includes template relation and element template. It establishes the connection with the flexible property of spring through the components, and models the relative position between the partial model and the whole model, so that each component can determine its own position through the relative position, while ensuring the flexibility of the component position.

There are two methods to detect the key points of multi-human skeleton: top-down method and bottom-up method. The top-down method first identifies the single person, and then detects each skeleton key points based on the individual. The typical model is AlphaPose of Shanghai Jiaotong University. The bottom-up method first detects all the key points and then clusters them. The typical model is OpenPose of Carnegie Mellon University. The top-down method needs to identify the single person first, and then determine the location of the key points, which takes a long time. Competitive sports requires athletes to respond quickly and accurately, so we choose OpenPose model as our skeleton key point detection model.

The OpenPose model obtains image features by training image with VGG-19 convolutional neural network architecture, and then divides image features into two branches for iterative training in two stages. The output of the first branch is the confidence set of human bone key points; the output of the second branch is the affinity vector set of human bone key points. The human skeleton key point recognition model is deployed on Intel FPGA and Intel Movidius NCS with Intel OPENVINO as the platform.

2 Key point speed calculation

The key points of human skeleton are obtained, and we also need the three-dimensional coordinates of related key points to calculate the motion speed and trajectory. However, because the image captured by the camera is two-dimensional, the position of key points in the three-dimensional world cannot be calculated accurately. Therefore, we propose an algorithm, which uses infrared distance sensor to measure the distance between the human body and the camera. Combined with the collected two-dimensional picture, we calculate the three-dimensional coordinates of the key points through the projection relationship.

First of all, we use the distance data and the two-dimensional image of the whole human body in a plane to calculate the real arm length and other information. Next, when the opponent is boxing, the arm of the opponent leaves the plane where the body is located. Through the projection relationship, we can calculate the Z coordinate of the key point, and then get the three-dimensional coordinate of the key point. The movement speed and direction of the key point are calculated by the change of the coordinate displacement between the two frames of the camera.

3 Decision

With the three-dimensional coordinates and speed information of key points, we can make decisions according to the behavior of our opponents. For example, by judging the speed of the opponent's fist, we decide to dodge or fight back. According to the football player's shooting angle, the goalkeeper chooses to defend the left or right side of the goal. The specific conditions for decision-making are very complex, which depends on the application scenarios. Therefore, we hope that in the future, we can do more in-depth research in the analysis of opponent behavior and optimization of decision-makers.

Comments (0)