I managed to build a working MVP with 4000 between tweets and amazon reviews for training, it seems to have very high accuracy but aside from test/validation set I have no real way knowing without more data. The current architecture consists of a one layer LSTM which is fed by word2vec encoded strings of text of max length 30 words. There are still many unresolved issues for example, which words to use in order to select the tweets expressing opinion on a candidate or just how to weight a favorable vs unfavorable tweet. This questions are crucial to the completion of the project but as with many datascience endeavours the most pressing problem is to actually find labeled data. For now the only concrete thing I have to show is a preliminary result from the recent Sicilian elections, where the RNN clearly detected that the right-wing party had more favourable sentiment but doesn't show that M5S had much better sentiment than the left leaning PD. Again these are preliminary results but I think show potential for this kind of approach despite the limited training set.
- Projects 0
- Followers 0
Machine learning enthusiast|like to code in Python,C++|currently learning deep learning through Vision and Speech.