Automatic Face Tracking & Gaze Direction Estimation for Animated, Interactive AI Conference Concierge

We're defining the technology needed to realize Lana, an animated AI conference concierge. This includes an Intel RealSense D415 along with Intel OpenVINO & NCS2 to run face detection & head pose estimation algorithms so that Lana maintains 'virtual' eye contact while interacting with a user. ...learn more

Project status: Published/In Market

Robotics, RealSense™, Networking, Internet of Things, Artificial Intelligence, Graphics and Media

Intel Technologies
Movidius NCS, OpenVINO, Other, Intel NUC

Links [3]

Overview / Usage

This solution arose from our entry into the IBM Watson AI XPRIZE competition, which challenges teams from around the world to solve 'grand challenges' to benefit society by incorporating Artificial Intelligence technology. Our original solution was intended to connect extended families, including children and older adults, with each other and enable their safe access to the Internet. Because building this solution has proven much more complex than originally expected, we've temporarily reduced the scope of the solution to serve as an animated AI conference concierge and check in agent, and plan to offer this as a packaged service for conferences and events globally as we continue R & D on the protective solution.

For this project, we've already implemented a functional prototype for an expressive , interactive animated character powered by AI. It incorporates audio and video sensors for user localization and animated motion control to maintain virtual eye contact as a user moves around the device.

Video-based face detection and gaze direction estimation algorithms, implemented using Intel OpenVINO and running on an Intel Movidius NCS2 deep learning accelerator, are used to control the gaze direction of our character so as to approximate looking in the direction of the user's face. This simulates maintaining eye contact while facilitating natural voice and visual interaction between the user and Lana, our displayed, animated avatar generated in real time using the Unity3D graphics engine.

IBM Cloud services, including Watson Assistant and Watson Speech To Text, are used for voice interaction, and Lana's voice is itself generated using Google Cloud Platform Text To Speech services. For recognizing event attendees, Microsoft Azure Cognitive Services are used to create biometric signatures from detected user faces and maintain the biometric signature database.

The solution runs on an Intel NUC and interfaces with conference check-in and live badging software provided by our partner, FieldDrive.

Our current work is focused on enabling all the backend services, including Watson and Azure, to run in Edge Cloud and hosted on Intel OpenNESS. We are also working on integrating the animated AI character onto a mobile robot.

Methodology / Approach

Our 3D front end is built using the Unity3D High Definition Rendering Pipeline and the Deep Neural Network models that power our face detection and head pose estimation run on an Intel Neural Compute Stick 2 via the Intel Deep Learning Deployment Toolkit. RGB+D data is obtained from an Intel RealSense D415 camera.

Technologies Used

Intel NUC

Intel Neural Compute Stick 2

Intel RealSense D415 RGB+D camera

Intel OpenVINO

Intel Deep Learning Deployment Toolkit

Microsoft Windows 10 IoT Enterprise

Unity3D

Collaborators

1 Result

1 Result

Comments (0)