Multimodal Representations from other Unimodal Representations: a Survey

Fernando Tadao Ito

0 0

0 Collaborators

My master's thesis, consisting of several experiments with unimodal and multimodal representations and how they are formed. ...learn more

Artificial Intelligence

Groups
Student Developers for AI

Overview / Usage

In this project, we explore different unimodal representations for text and image and their combinations in multimodal representations, measuring them by empirical classification tests on multimodal datasets. The objective is to see if there is substantial gain in classification F-scores with multimodal representations, and if different representations affect this performance.

We are working in e-commerce and news datasets, with images and text, and combining different representation techniques. For images, we use SIFT, SURF and ORB visual words; for text, we use LSI and LDA topical representations, GloVe and Word2Vec for word representations. To generate multimodal representations, we use a simple Deep Multimodal Autoencoder trained for reconstruction.

Early results on product category classification suggest that not always complexity is key: LSI has amazing performance and beats all other representations on this task, unimodal and multimodal alike. As more data and results come by, I'll update this project. An upcoming paper submitted to LREC will be attached here later (if it goes through, fingers crossed).

Comments (0)

You have disabled JavaScript

We are sorry, but without JavaScript we are currently unable to display the latest activity feed. Please, enable Javascript in your browser.

Multimodal Representations from other Unimodal Representations: a Survey

Fernando Tadao Ito

Overview / Usage

Login to continue

This action requires you to be logged in.

Thanks for voting. Please leave a comment.

Multimodal Representations from other Unimodal Representations: a Survey

Fernando Tadao Ito

Overview / Usage

Login to continue

This action requires you to be logged in.