The importance of Relational reasoning in simple story-based Q&A

Andrea Cossu

Andrea Cossu

Pisa, Toscana

2 0
  • 0 Collaborators

In this project I assess the performance of Relation networks on bAbI dataset, following the model and the training procedure outlined in the paper "A simple neural network module for relational reasoning" by Santoro et. al. (2017). ...learn more

Project status: Under Development

Artificial Intelligence

Groups
Student Developers for AI

Intel Technologies
Intel Python

Code Samples [1]

Overview / Usage

Traditional Machine Learning and Deep Learning architectures struggle when it comes to deal with relational reasoning tasks. The featured image (Santoro et. al. (2017), paper available at this link) highlights the difference between a non-relational question on a given image and a relational one. While the former focuses on a specific property of a specific object in the image, the latter requires knowledge about multiple objects and the relations between them.

Relation networks are capable of dealing with relations by design. They combine together two feedforward neural networks: the first one learns to evaluate the relations between all pairs of input objects, the second one combines the acquired knowledge into a final prediction. Often, the model is part of a larger architecture which includes Convolutional Neural Networks and/or Recurrent Neural Networks that provide the embeddings subsequently used as objects by the Relation network.

Relation networks achieve impressive results in story-based Q&A tasks (e.g. bAbI dataset) and on image-based Q&A taks (e.g. CLEVR dataset), surpassing the state of the art by a large margin.

Relation networks can be exploited in real-world applications in order to better understand complex relations between data and, hopefully, with the aim to provide a commonsense reasoning capability to neural networks models.

Methodology / Approach

The approach followed in the experiment design is very similar to the one published in the Santoro's paper which introduces Relation networks. However, the paper does not report all the experiment settings and hyperparameters configuration used to obtain the results.

An important part of this project focuses on testing various hyperparameters configurations, different activation functions and training tricks. As an example, in order to keep track of the original relative order of a set of facts related to a particular question, the authors concatenate to the input vectors an additional piece of information, but do not specify clearly its implementation. In this project, I try two different options: a one-of-k vector configuration and a Transformer-like configuration. The former assign an increasing time-related number to each fact and then convert it to one-hot encoding. The latter uses a combination of sine and cosine wave functions, as proposed by "Attention is all you need" paper which introduces the popular Transformer architecture.

This methodology allows to compare different experiment setting and design choices in order to stress the Relation networks capabilities.

Technologies Used

The project is developed using the PyTorch framework (v. 1.3.0) and Python3. I use Weights and Biases online platform to monitor the experiments.

The experiments are executed against both a laptop configuration and a server configuration, using Intel processors and NVIDIA GPUs.

I plan to include a comparison between standard Python3 and Intel Python3 on specific architectures in order to assess performance gains in the training and inference phases.

Repository

https://github.com/AndreaCossu/Relation-Network-PyTorch

Comments (0)