Dynamic Lexicon Generation for Natural Scene Images
Vishal Bidawatka
Hyderabad, Telangana
- 0 Collaborators
In this project we propose a method that generates contextualized lexicons for scene images using only visual information. For this, we exploit the correlation between visual and textual information in a dataset consisting of images and textual content associated with them. (Topic modelling + CNN ) ...learn more
Project status: Under Development
Overview / Usage
Many scene text understanding methods approach the endto-
end recognition problem from a word-spotting perspective and take
huge benet from using small per-image lexicons. Such customized lexicons
are normally assumed as given and their source is rarely discussed.
In this project we propose a method that generates contextualized lexicons
for scene images using only visual information. For this, we exploit
the correlation between visual and textual information in a dataset consisting
of images and textual content associated with them. Using the
topic modeling framework to discover a set of latent topics in such a
dataset allows us to re-rank a xed dictionary in a way that prioritizes
the words that are more likely to appear in a given image. Moreover,
we train a CNN that is able to reproduce those word rankings but using
only the image raw pixels as input. We demonstrate that the quality
of the automatically obtained custom lexicons is superior to a generic
frequency-based baseline.
Methodology / Approach
The underlying idea of our lexicon generation method is that the topic modeling
statistical framework can be used to predict a ranking of the most probable words
that may appear in a given image. For this we propose a three-fold method: First,
we learn a LDA topic model on a text corpus associated with the image dataset.
Second, we train a deep CNN model to generate LDA's topic-probabilities directly
from the image pixels. Third, we use the generated topic-probabilities,
either from the LDA model (using textual information ) or from the CNN (using
image pixels), along with the word-probabilities from the learned LDA model to
re-rank the words of a given dictionary.
Technologies Used
- gensim library
- keras
- Topic modelling
- CNN
Repository
https://github.com/vishalbidawatka/Dynamic-Lexicon-Generation-for-Natural-Scene-Images