The project features the development of a library that will detect the font of a text in an image. The library will be accessed via a companion Android application that enables a designer to take pictures using the smartphone camera. The input to the library will be either: an image taken using a camera, a scanned image or a jpeg image, and must contain text.
Text and font recognition will be a combination of OCR, CNNs and deep learning.
First, I will need to build an Optical Character Recognition system, which is basically trying to build a model for the same. The MNIST dataset will come in handy at this point. A large font database will be needed, which I intend to source from the largest font database there is: Google Fonts (fonts.google.com). The font catalogue import will be used for comparison with the extracted font shapes. This database will form the testing and training set used to build the model.
Recognition of any text recognition system is the effectiveness of its algorithms. Tensorflow is my choice of deep learning framework given it is easy to implement and has a large community backing. Tensorflow with Attention is widely used in OCR. Tensorflow’s algorithm is based on combining morphological operation sensitive to specific shapes in the input image with a good threshold value.
Intel Deep Learning SDK tool will be essential in training the model using the Tensorflow OCR algorithm, and consequently optimizing the trained model to perform effectively in the edge device (smartphone). Achieving >90% accuracy is the goal with using the SDK tool.
When an image is taken, the user can crop to the area containing specific typefont set. I intend to use a classifier to decide if there are texts in the cropped area. The system will perform character extraction by first binarizing the image, regulating the contrast & brightness of the image, and performing character segmentation. The segmentation algorithm will be motivated by the structure of the script. K-means clustering looks like a promising algorithm for the segmentation.
Boundaries of the text fonts will need to be recognized so as to determine the shape of the font, skew and other properties. Hence, I also intend to develop a font-outline algorithm to improve comparison results where positive samples are text areas while negative samples are other areas within the cropped image boundary.
The data captured by the smartphone cameras will be uploaded to Intel servers, where the model is running. Inference will occur within the server infrastructure and the result will be pushed back to the mobile phone as details. Choice of using Intel server architecture is due to reduced training time and fast inference.
Machine learning will be applied in continually training the model over time to achieve faster and more accurate search results.
This project covers a lot of ground, and thus I invite any willing collaborators to jump in and assist with any part of the project. Successful development of the system will assist many designers (and developers) to advance the quality of their designwork (graphic, web and product design alike).