Knowledge Extraction in Text Documents using Memory Augmented Neural Networks

cristian Villalobos

cristian Villalobos

Rio de Janeiro, Rio de Janeiro

1 0
  • 0 Collaborators

Extracting knowledge from text is one of the most interesting topics in Deep Learning. Becoming able to structure information that was only understandable by humans, allow the development of more complex processes, such as question answering systems, comparison between documents or increasing the structure of global knowledge. Some new Deep learning model are able to control an external memory matrix. This memory is considered external because its size does not interfere in the behavior of the DNC, as long as it is not entirely filled. If the memory is considered as the DNC’s RAM, then the network is its controller, such as a differentiable CPU whose operations are learned with gradient descent. The project proposes a Deep Learning model with external memoery based on information retrieval system, capable of understanding and integrating information from various documents within the same memory. The model will shows similarities or contradictions in these documents, creating a universal knowledge graph of the data set. The core of the model would be connected to a multi-task learning process, changing only its input and output interfaces. This allows the model to specialize in the creation of the graph. ...learn more

Project status: Under Development

Artificial Intelligence

Intel Technologies
AI DevCloud / Xeon, MKL, Intel Opt ML/DL Framework, BigDL

Overview / Usage

Nowadays, many companies have terabytes of data in repositories, called ‘Data Lake’, which include
unstructured data. Becoming able to structure automatically this data - such as e-mails, PDFs and text
documents - allow companies to use this information and increase their own expertise, or know-how.
Using this new information would help considerably, for example, to make decisions.
Creating knowledge graphs from text have been handcrafted or highly supervised for many years. These
graphs are determined by entities types, relations and their rules. Recently, some weakly supervised
algorithms have been proposed, and these models attempt to extract the knowledge graph automatically,
identifying its entities, relations and connections.
Taking advantage of these technologies to create a system capable of extracting, sharing and comparing
knowledge between documents in a weakly supervised system can improve the productivity of any
organization.

Methodology / Approach

Data Collection: From raw text

  • Create a synthetic annotations datasets.
  • Define the new model using program induction with deep learning.
  • Compare DL architecture with different parameterization methods
  • Integration of DL architecture with a knowledge graph model.
  • Analyze and improve the DL architecture with real documents.

Technologies Used

Intel® Xeon® Scalable Processor.
Intel® AI DevCloud.
BigDL.
Intel® Optimization for TensorFlow*
Intel® Optimization for Keras*
Intel® Distribution for Python*
Intel® Math Kernel Library (Intel® MKL)
Compute Library for Deep Neural Networks (clDNN)
Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN)

Comments (0)