Visual AID using Image Captioning on Intel OneAPI

Vivek Muskan

Vivek Muskan

Bengaluru, Karnataka

1 0
  • 0 Collaborators

This visual assistant app helps blind users by generating image captions with Intel OneDNN and oneDAL reading them with Google text-to-speech library. Built using Streamlit and OneAPI toolkits, it extracts visual features and creates descriptions to provide greater awareness of surroundings. ...learn more

Project status: Published/In Market

oneAPI, Artificial Intelligence, Cloud

Intel Technologies
oneAPI, AI DevCloud / Xeon, Intel Python

Code Samples [1]

Overview / Usage

Image captioning is the process of generating a natural language description of an image. It is a task in the field of computer vision and natural language processing. The goal of image captioning is to generate a coherent and fluent sentence that accurately describes the image content.

Methodology / Approach

This image captioning model takes an image as input and generates a textual description of the contents of the image.

The model uses a convolutional neural network (CNN) architecture to analyze the visual aspects of the input image. The CNN encodes the image into a dense feature vector capturing information about the objects, scenes, and relationships depicted.

This image vector is passed to a recurrent neural network (RNN) which generates the text caption one word at a time. The RNN uses the context vector from the CNN as it decoders the image features into a natural language sentence describing the image content.

The model is trained end-to-end on a dataset of images labeled with human-written captions. This allows the model to learn the correlations between image contents and textual descriptions.

After training, the model can generate new captions for images it hasn't seen before. The quality of the generated captions depends on the size and diversity of the training dataset.

This project demonstrates how CNN and RNN architectures can be combined to perform image to text translation. The model is able to generate basic descriptions of image contents, though there is still room for improvement in caption quality and diversity.

Technologies Used

Intel OneDNN

Intel OneDAL

TENSORFLOW

STREAMLIT

Repository

https://github.com/viveklistenus/VisualAid_intelOneAPI

Comments (0)