African Motion Content Network (AMCnet)

African Motion Content Network (AMCnet)

Motion Content Encoder-decoder

Artificial Intelligence, Modern Code

Description

We propose a deep neural network for the prediction of future frames in natural video sequences using CPU. To effectively handle complex evolution of pixels in videos, we propose to decompose the motion and content, two key components generating dynamics in videos.

The model is built upon the Encoder-Decoder Convolutional Neural Network and Convolutional LSTM for pixel-level prediction, which independently capture the spatial layout of an image and the corresponding temporal dynamics. By independently modeling motion and content, predicting the next frame reduces to converting the extracted content features into the next frame content by the identified motion features, which simplifies the task of prediction.

The model we aim to build should be end-to-end trainable over multiple time steps, and naturally learns to decompose motion and content without separate training. We evaluate the proposed network architecture on human AVA and UCF-101 datasets. We show state-of-the-art performance in comparison to recent approaches.

This is an end-to-end trainable network architecture running on the CPU with motion and content separation to model the spatio-temporal dynamics for pixel-level future prediction in natural videos.

Gallery

Links

Fork me on GitHub

Default user avatar 57012e2942

Moloti N. created project African Motion Content Network (AMCnet)

Medium fb17b521 cbc1 4b6f 8878 c6030c6a7fde

African Motion Content Network (AMCnet)

We propose a deep neural network for the prediction of future frames in natural video sequences using CPU. To effectively handle complex evolution of pixels in videos, we propose to decompose the motion and content, two key components generating dynamics in videos.

The model is built upon the Encoder-Decoder Convolutional Neural Network and Convolutional LSTM for pixel-level prediction, which independently capture the spatial layout of an image and the corresponding temporal dynamics. By independently modeling motion and content, predicting the next frame reduces to converting the extracted content features into the next frame content by the identified motion features, which simplifies the task of prediction.

The model we aim to build should be end-to-end trainable over multiple time steps, and naturally learns to decompose motion and content without separate training. We evaluate the proposed network architecture on human AVA and UCF-101 datasets. We show state-of-the-art performance in comparison to recent approaches.

This is an end-to-end trainable network architecture running on the CPU with motion and content separation to model the spatio-temporal dynamics for pixel-level future prediction in natural videos.

Bigger dnlx9oj
  • Projects 3
  • Followers 3

Thabo Koee

A Futurist whose enthusiastic about implementation of AI in broader scales, with the usage of robust computational processing power.

Namibia

Bigger dnlx9oj
  • Projects 3
  • Followers 3

Thabo Koee

A Futurist whose enthusiastic about implementation of AI in broader scales, with the usage of robust computational processing power.

Namibia