Language modelling

6 0
  • 0 Collaborators

Universal Language Modelling (ULMFit): Transfer Learning was kind of limited to computer vision up till now, but recent research work shows that the impact can be extended almost everywhere, including natural language processing (NLP), reinforcement learning (RL). Recently, a few papers have been published that show that transfer learning and fine-tuning work in NLP as well and the results are great. Previous research involved incremental learning in computer vision, bringing generalization into models since it’s one of the most important components in making learning in neural networks robust. One paper that aims to build on this is Universal Language Model Fine-tuning for Text Classification. state of the art other LM using different architectures using RNNs/LSTMs/GRUs ...learn more

Project status: Published/In Market

Artificial Intelligence

Groups
Student Developers for AI

Intel Technologies
AI DevCloud / Xeon, Intel Opt ML/DL Framework

Code Samples [1]Links [1]

Overview / Usage

This project aims to creating on Universal Language models and builds on ULMFit which has shown tremendous improvements in accuracies using Language Model (Transfer Learning in NLP) . The goal is make usage of LM more easier and accessible and similar to transfer learning in CV .It also contains different architectures of language modelling models using RNNs/GRU/LSTMs .
The project made use of Jupyter* notebook on the Intel AI DevCloud (using Intel Xeon Scalable processors) to write the code and for visualization purposes. Information from the Intel® AI Academy forum was also used for optimization purposes with Intel Xeon processors. The code used can be found in this GitHub* repository or in this Fast.ai original implementation by Jeremy Howard. Some adjustments for optimization on this architecture can be found here.

Methodology / Approach

It makes use of a pre-trained AWD-LSTM trained on wiki-text103, we use that as the base model, using novel techniques like:

  • Classifier fine-tuning for task specific weights
  • Discriminative fine-tuning
  • Concat pooling
  • Training the classifier (gradual unfreezing)
  • Backpropagation Through Time (BPTT) for text classification
    This project includes implementations of research papers to demonstrate better and efficient ways to perform Language Modelling. More details can be found in the Github repository

Problems being solved by ULMfit
This method can be called universal because it is not dataset-specific. It can work across documents and datasets of various lengths. It uses a single architecture (in this case AWD-LSTM, just like ResNets in CV). No custom features must be engineered to make it compatible with other tasks. It doesn’t require any additional documents to make it work across certain domains.

This model can further be improved with using attention and adding skip connections wherever necessary.

Technologies Used

Pytorch,fastai,DevCloud,Tensorflow

Repository

https://github.com/prajjwal1/language-modelling

Comments (0)