TrueSight

Sumit Saha

Sumit Saha

Unknown

1 0
  • 0 Collaborators

TrueSight is the potential solution to providing assistance to the visually impaired in a world of complex ambient activities and interactions. It leverages the power of AI in Computer Vision to analyze real-time events and generate feedback in the form of audio to keep the user as aware as possible ...learn more

Project status: Under Development

Internet of Things, Artificial Intelligence

Groups
Student Developers for AI

Intel Technologies
Intel Python

Code Samples [1]

Overview / Usage

TrueSight, in a nutshell, is the potential solution to providing assistance to the visually impaired in a world of complex ambient activities and interactions. Our framework leverages the power of Deep Learning in Computer Vision to analyse events around the user and generate feedback in the form of audio, so as to keep the user as aware as possible. The project is by no means a replacement for vision, but aims towards being a complement to the stronger senses of the person.

Current Features
  • Scene Classification
  • Real-Time Object Detection
  • Scene Captioning
  • Text Recognition
  • Text to Speech

Methodology / Approach

The main goal of the framework is to restrict the number of manual interactions happening between the user and software. Hence, the design of the project is such that it is able to capture frames in real-time and then process the captured information for object detection, scene captioning, text recognition, text-to-speech, and scene classification.

Object Classification is quite simple in terms of its methodology. An environment might get too complex when there are a lot of activity going on. When it is such, we have noticed that out framework churns out an accuracy of below 85% while classifying objects. This means that the object is either out of bounds, or quite new. The scope for improvement here is to train the classifier on as many objects as possible. For demonstrating a Proof of Concept, the accuracy threshold has been set to around 90% to accept as a critical piece of information while capturing frames in real-time.

The scene classification module can recognise around 200 scenes of variable complexity. Once it is classified, a text is generated depending on the type of scene and activity within the scene. This conversion module is partly automated and partly rule based. The feature is an on demand addition. Once activated, the text is converted into speech and sent to the user.

The scene captioning module is a cool application of scene classification, object segmentation, activity recognition, and Natural Language Processing. The frame used in scene classification, is segmented into various parts, and the activities are reported in human understandable form. For example, if an image of a a football field with players is fed as input, the output is something of the form "Players are playing with a ball on the football field".

There are a lot of ways in which this project can be improved, innovated upon, and ultimately made a reality. With enough support from collaborators it will be not only cost effective but also easy to use. It is my belief that this project, if implemented correctly can be a boon to our society.

Technologies Used

  • Raspberry Pi 3
  • OpenCV - Python
  • Keras
  • Flask
  • PyTorch

Repository

https://github.com/ss-is-master-chief/TrueSight

Comments (0)