Detecting vehicles in a video stream is an object detection problem. An object detection problem can be approached as either a classification problem or a regression problem. In the classification approach, the image are divided into small patches, each of which will be run through a classifier to determine whether there are objects in the patch. The bounding boxes will be assigned to patches with positive classification results. In the regression approach, the whole image will be run through a convolutional neural network directly to generate one or more bounding boxes for objects in the images.
The goal of this project is to detect the vehicles in a camera video. The You Only Look Once (YOLO) algorithm is used here to detect the vehicles from a dash camera video stream. This feature is an extremely important breakthrough for self-driving cars as we can train the model to also recognize birds, people, stop signs, signals and much more.
In this project, we will implement the version 1 of tiny-YOLO in Keras, since it’s easy to implement and is reasonably fast.
The YOLO approach of the object detection is consists of two parts: the neural network part that predicts a vector from an image, and the postprocessing part that interpolates the vector as boxes coordinates and class probabilities. For the neural network, the tiny YOLO v1 is consist of 9 convolution layers and 3 full connected layers. Each convolution layer consists of convolution, leaky relu and max pooling operations. The output of this network is a 1470 vector, which contains the information for the predicted bounding boxes. The 1470 vector output is divided into three parts, giving the probability, confidence and box coordinates.