Click2Know - Object Recognition through Smartphone using Deep Learning Techniques

Saurabh Sukhatankar

Saurabh Sukhatankar

Bengaluru, Karnataka

1 0
  • 0 Collaborators

It's an android application used in Smartphones used to recognize the objects in the image clicked by the smartphone camera. ...learn more

Project status: Published/In Market

Mobile, Artificial Intelligence

Groups
DeepLearning, Artificial Intelligence India

Intel Technologies
Intel GPA

Code Samples [1]Links [1]

Overview / Usage

Object recognition technology has matured to a point at which exciting applications have become possible. Indeed, industry has created a variety of computer vision products and services from the traditional area of machine inspection to more recent applications such as object detection, video surveillance, or face recognition.

This project is about achieving the goal of object recognition through advanced techniques like deep learning on handy devices like smartphones and tablets. Deep learning algorithms (Convolutional Neural Networks (CNN)) are used for the primary aim of object recognition. Images are clicked through the camera of the smartphone during experimentation and are fed to the CNN network. The top four results predicted by the network are depicted on the smartphone screen in the audio and the visual form i.e. predicted object name and the probability of predicted object being the one actually clicked in the decreasing order of accuracies. The accuracy obtained in object recognition is about 93% through the application.

Methodology / Approach

The main aim of proposed system is to recognize objects through smart phone for visually challenged people using the deep learning on Android platform.

  1. Initially user (visually challenged) has to create an account simply by speaking "create an account". Using account, user can capture the photo of object to be recognized. Within fraction of seconds produced result will be spoken as well as displayed on the screen. Firebase being a real time database is used so that real time result can be computed. The image which is captured by camera of user is uploaded to real time database in the format of byte image. Database schema for users comprises of unique UID (user id) which is generated according to firebase account of the individual user along with the images which are to be uploaded through multiple requests from different user at same instance of time. The image which is in the byte format needs to be converted into regular .PNG format so that further processing can be done on the image. Conversion of the image to 299x299 pixel is performed. Before feeding image to Convolutional Neural Network Inception V3 model it is necessary to convert linear structure like Array which is vectored form of corresponding image in .rbg format. Feeding these values directly into a network may lead to numerical overflows. It also turns out that some choices for activation and objective functions are not compatible with all kinds of input. The wrong combination results in a network doing a poor job at learning which is done by pre-processing technique Dimensionality Reduction. Dimensionality Reduction helps in transforming vectored image data into a compressed space with less dimensions, that can be useful to control the amount of loss and it uses as input to CNN. After performing dimensional reduction it is necessity to adequate image to the format the model requires. The image is fed to CNN model for prediction.

  2. Convolutional neural network (CNN) is the present high-tech model architecture for image classification tasks. CNN possesses a sequence of filters to the raw pixel records of an image to mine and lean high-level features, which is use to classify. CNNs contains three components

  • Convolutional layers: This layer applies a definite number of convolution filters to the image. For each substitute region, the layer executes a set of mathematical operations to produce a single value in the output feature map. Convolutional layers then typically apply a ReLU initiation function to the output to present nonlinearities into the model.
  • Pooling layers: Pooling layers depressed sample the image data mined by the convolutional layers to shrink the dimensionality of the feature map in order to decline processing time. A frequently used pooling algorithm is max pooling, which mines sub regions of the feature map (e.g., 2 x 2-pixel tiles), keeps their maximum value, and discards all remaining.
  • Dense (fully connected) layers: Dense layers accomplish classification on the features mined by the convolutional layers and down experimented by the pooling layers. In a dense layer, every node in the layer is linked to every node in the prior layer. After classification results will be probabilistic. Here 0.78 means the probability of laptop being in the image is 0.78 in the similar way 0.56 conveys the probability of screen being in the image is 0.56 so on.
  1. Among the computed results, the images having top 4 probabilities will be pushed onto the real time database within fraction of seconds for that specified user. Thereafter from the real time database, the android application will fetch them instantly. In order to facilitate fast and efficient working of real time database the user along with result are deleted. The android application will show the result of object recognition in audio as well as text format in real time.

Technologies Used

  • Technology Stack
  1. Hardware - Smartphone with Internet Connectivity
  2. ML Algorithms - Deep Feed Forward Neural Network or Inception v3(Transfer Learning)
  3. Dataset - Object Datasets : Cifar10, Cifar100, COCO, ImageNet(Transfer Learning)
  4. Databases - Firebase
  5. Python Libararies/Frameworks - OpenCV, NumPy,, Pyrebase

Repository

https://github.com/SukhatankarSV/Click2KnowM

Comments (0)