Hiding Audio in Images: A Novel Award Winning Deep Learning Approach
Rohit Gandikota
Guntur, Andhra Pradesh
- 0 Collaborators
In this work, we propose an end-to-end trainable model of Generative Adversarial Networks (GAN) that is engineered to hide audio data in images. This model named VoI-GAN(Voice over Image GAN) can be used for both watermarking and steganography purposes ...learn more
Project status: Published/In Market
Intel Technologies
DevCloud,
Intel CPU
Overview / Usage
In this work, we propose an end-to-end trainable model of Generative Adversarial Networks (GAN) that is engineered to hide audio data in images. Due to the non-stationary property of audio signals and lack of powerful tools, audio hiding in images was not explored well. We devised a deep generative model that consists of an auto-encoder as generator along with one discriminator that are trained to embed the message while, an exclusive extractor network with an audio discriminator is trained fundamentally to extract the hidden message from the encoded host signal. The encoded image is subjected to few common attacks and it is established that the message signal can not be hindered making the proposed method robust towards blurring, rotation, noise, and cropping. The one remarkable feature of our method is that it can be trained to recover against various attacks and hence can also be used for watermarking.
Methodology / Approach
Data hiding is to embed a message (MSG) into a cover image (CI) and then later to be able to extract the message from the encoded image (EI). So there is a need for two processes to accomplish a successful secret communication. In this paper, we propose a novel method for the two above mentioned tasks. We propose an autoencoder to reduce the messages to latent representations, so as to help the training of the embedder which is a multi-objective GAN. This base learning of simple latent representation makes the GAN training efficient. An exclusive extractor network with adversarial training to fundamentally extract the embedded messages is also required.
VoI-GAN consists of five nets; an embedder Emb, discriminator for embedder D1, an extractor Ex, a secondary discriminator for extractor _**D2 **__and another autoencoder _AE_, for actual message to latent domain transformation. It is important to note that, GANs are unstable during training and may lead to distortions in output images. We propose a custom multi-objective framework with base training for VoI-GAN, which has resolved the issue of mode-collapse in GAN training.
We have used a high end GPU for training the model on a variety of datasets, however, to test the real time performance of VoI-GAN, we have used Intel's Devcloud by importing the trained model from local. It is observed that DevCloud's Xeon Gold 6128 processor performs better by a factor of 10 in terms of computing time over the i7 processor on our local system.
Technologies Used
Keras
Tensorflow
Devcloud
Numpy
SciPy
Sci-kit