The Future of Immersive Filmmaking: Behind the Scenes at Intel Studios

Ilke Demir

Ilke Demir

Los Angeles, California

In this talk, we will visit Intel Studios volumetric capture stage, pioneering a new dimension in filmmaking. We'll explore technical challenges of 10X scale, AI methods to process 2D/3D visual data, and its capabilities over traditional filmmaking, such as 3D editorial tools, and remote production. ...learn more

Project status: Published/In Market

Virtual Reality, Artificial Intelligence, Graphics and Media

Groups
SIGGRAPH 2020

Intel Technologies
Other

Docs/PDFs [1]Links [1]

Overview / Usage

As the entertainment industry enters the AI & Immersive Media era, it seeks novel technology to enrich and revolutionize storytelling. Looking beyond traditional cinema, volumetric capture is projected to become a new dimension in how content is produced and consumed. Materializing this projection, Intel Studios built the world's largest volumetric stage. Creating this new medium has brought unique challenges and opportunities to filmmaking, which are set to define the next 5-10 years of production. In this talk, we will visit the 10.000 ft2 Intel Studios stage to peek behind the scenes of its productions, explore the technical challenges of 10X scale in time and space when filming with 100 8K cameras, and discuss the stage's capabilities over traditional filmmaking. We will shine a spotlight on data-oriented tasks such as conversion and storage of 16.5 TB/min of raw footage, high-quality 3D reconstruction from long range imagery, and delivering our experiences to consumer devices, such as mobile and XR headsets using real-time game engines. We at Intel are thinking beyond with exciting AI solutions for segmentation, tracking, retargeting, frame interpolation, and 2D/3D processing, pushing the bounds of compute and graphics. The talk will conclude with some previews of productions from our studio.

Methodology / Approach

Our studio is a real-life production set for volumetric capture of 10000 sq ft, capturing up to 30 people simultaneously. With 100 8K resolution cameras, the data capture provides 270 GB/sec of raw RGB input travelling, through 5 miles of fiber optic cables.

Contrary to traditional filming, volumetric capture reduces the dependency of physical presence of objects and layouts. We can capture real actors in imaginary scenes with imaginary objects, however the story, scripting, props, costumes and layout all need to satisfy the requirements discussed during preproduction.

During shooting, we transmit 270 GB of data per second, so utilizing a subset of this to provide live feedback for directors and actors is a challenging task by itself too. The capture system writes raw camera frames at a rate of 30fps 8K to 50 dedicated capture nodes, each with 22TB of raided solid state hard drives enabling a 17TB per min throughput. We provide some exemplars in real-time to validate the capture, and a GUI for real time feedback for deletion/editing of sequences. We also enable directors and actors with 360 feedback, and close up cameras.

One challenge of having over 100 cameras is to keep them consistent, which makes calibration a crucial part of our pipeline. We combine some state of the art calibration approaches with our own setup of calibration cubes and presets in the studio. We calibrate and correct for lens distortion, intrinsics, extrinsics, color, lighting, and time synchronization. we first do a pixel-wise denoising with a nearest neighbor search to replace bad pixels. Then we correct for vignette using cosine fourth attenuation and gain by multiplying against the empty stage. As the fourth step, we apply edge-aware debayering in order to preserve details and not introduce alising artifacts. Then we white-balance and apply the color correction matrix obtained using the color calibration before. Finally, we detect and discard frame tears using the deformation of the vignette. Now all frames are ready to be reconstructed. H

aving very high resolution images that span our capture volume enables us to hold per voxel operations in a fast and accurate way, coherent with the memory utilization. Combining this capability with hierarchical subdivision of the known volume creates high resolution point clouds in the adaptive resolution we require. Afterwards, we can compute per point normals and colors using traditional geometry queries, and point normal-camera associations.

Of course reconstructed proxies are never complete, as in most of the real world applications. Thus, we build point cloud processing tools, some of which are demonstrated here. For example, we build deep learning models to clean, denoise, and segment point clouds for various uses. We implement automatic tracking, deformation, and morphing tools in 3D to replace and edit assets. We also develop retargeting approaches for expression and pose transfer in 3D, in order to animate our old captures.

Parallel to the automatic point cloud processing tools, postproduction includes various pipelines to enable artists’ creative power, bridging the gap between physical and digital worlds. These tools include color blending to have consistent colors without losing high frequency details, frame interpolation to interpolate frames both spatially and temporally, asset management to fit and morph digital objects, character separation for individual modeling of actors.

We have several outlets and partnerships that have different requirements. For example mobile and AR/VR platforms have specific size requirements, whereas movie producers have artistic concerns. To address all of these, we invented our temporal point cloud compression tool that decreases the file sizes 5X to 10X smaller, with adaptive thresholding temporally and spatially. We also developed integration to Houdini, Unreal, and Unity engines to enable creative developers to easily work on the assets created in our studio. For use cases that do not require 3D output but 2D renders, we implemented our view dependent rendering approach for seamless navigation within the virtual dome.

Technologies Used

The world's largest volumetric capture studio, over 100 Intel powered servers in the server farm, Open3D, over 100 mixes of Intel® Xeon® processors, Core™ i7, i9.

Documents and Presentations

Collaborators

1 Result

1 Result

Comments (0)