Srk2Cage using DeepFake

Pranab Sarkar

Pranab Sarkar

Jalpaiguri, West Bengal

2 0
  • 0 Collaborators

Deepfake is a technique for human image synthesis based on artificial intelligence. It is used to combine and superimpose existing images and videos onto source images or videos using a deep neural network. ...learn more

Project status: Under Development

Artificial Intelligence

Groups
DeepLearning, Artificial Intelligence Europe, Artificial Intelligence West Coast, Artificial Intelligence India, Early Innovation for PC Skills

Intel Technologies
AI DevCloud / Xeon

Code Samples [1]

Overview / Usage

Data Collection:

  1. Shahrukh Khan: https://www.youtube.com/watch?v=zRSjxp67Yzk&t=447s.
  2. Nicolas Cage: Collected his images from google images using a web crawler.

Methodology / Approach

  • **Extraction: **Without hundreds (if not thousands!) of face pictures, we will not be able to create a deepfake video. A way to get around this is to collect a number of video clips which feature the people you want to face-swap. The extraction process refers to the process of extracting all frames from these video clips, identifying the faces and aligning them. The alignment is critical, since the neural network that performs the face swap requires all faces to have the same size (usually 256×256 pixels) and features aligned. Detecting and aligning faces is a problem that is considered mostly solved, and is done by most applications very efficiently.
  • **Training: **It is important to notice that if we train two autoencoders separately, they will be incompatible with each other. The latent faces are based on specific features that each network has deemed meaningful during its training process. But if two autoencoders are trained separately on different faces, their latent spaces will represent different features. During the training phase, these two networks are treated separately. The Decoder A is only trained with faces of A; the Decoder B is only trained with faces of B. However, all latent faces are produced by the same Encoder. This means that the encoder itself has to identify common features in both faces. Because all faces share a similar structure, it is not unreasonable to expect the encoder to learn the concept of “face” itself.
  • **Inference: **When the training process is complete, we can pass a latent face generated from Subject A to the Decoder B. As seen in the diagram below, the Decoder B will try to reconstruct Subject B, from the information relative to Subject A. If the network has generalised well enough what makes a face, the latent space will represent facial expressions and orientations. This means generating a face for Subject B with the same expression and orientation of Subject A.

Technologies Used

Tensorflow

Python

Repository

https://github.com/pranabsarkar/deep-fake-srk-cage/blob/master/faceswap.py

Comments (0)