Posts

Add Post

« Return to Posts

Integration of Arhat into Intel oneAPI deep learning ecosystem

Integration of Arhat into Intel oneAPI deep learning ecosystem

Introduction

Arhat is a cross-platform deep learning framework that converts neural network descriptions into lean standalone executable code. Arhat has been designed for deployment of deep learning inference workflows in the cloud and on the edge.

Arhat is a vendor-agnostic tool supporting multiple target platforms. Unlike the conventional deep learning frameworks, Arhat translates neural network descriptions directly into platform-specific executable code. This code interacts directly with the platform library of deep learning primitives and has no other external dependencies. Furthermore, the generated code includes only the functionality essential for handling the given model on the selected platform. This approach facilitates generation of the lean code and substantially streamlines the deployment process.

Interoperability with oneAPI

Arhat interoperates with two components of oneAPI ecosystem: oneDNN and OpenVINO.

Arhat relies on oneDNN library for efficient cross-platform implementation of deep learning operations on Intel hardware. Arhat back-end for Intel generates code that directly calls oneDNN deep learning primitives and can be used on any modern Intel computing hardware, including Xeon CPUs and Xe GPUs.

Arhat includes the OpenVINO interoperability layer that imports models produced by the OpenVINO Model Optimizer. The imported models can be natively deployed on any target platform supported by Arhat.

Objectives and added value

Integration of Arhat with oneAPI aims at extending capabilities of the oneAPI deep learning components in these three areas:

  • native deployment of OpenVINO models on non-Intel platforms
  • handling complex models not directly supported by OpenVINO
  • reliable benchmarking of OpenVINO models on various target platforms.

Resulting benefits are highlighted in the following sections.

Native model deployment on non-Intel platforms

Given the rapidly evolving technological landscape, end users prefer vendor- and platform- agnostic software tools. OpenVINO is targeting primarily the Intel computing hardware. With Arhat, OpenVINO models can be natively deployed on any other platforms supported by Arhat back-ends. This opens a way for providing the best OpenVINO performance on any hardware.

Example: Arhat back-ends for NVIDIA support both cuDNN and TensorRT inference libraries. We have successfully used Arhat for deployment of a representative set of object detection models from Intel Open Model Zoo (8 models of SSD, Faster R-CNN, and YOLO families) on various NVIDIA GPUs ranging from an embedded Jetson Xavier NX to a powerful RTX 3090.

Handling complex models and libraries

Some popular deep learning models and libraries are not directly supported by OpenVINO due to their complexity. This is frequently the case for models trained using PyTorch as this framework does not provide a native exchange format covering the complete set of supported layer types.

OpenVINO provides extensibility mechanism for implementing custom layers in both Model Optimizer and Inference Engine. However, this approach will require deployment and further maintenance of the entire extended Inference Engine on a target platform. This process is cumbersome and not cost-efficient.

Arhat takes a different approach to extensibility by providing an off-line engine generating lean deployable code. Most of the extensions implementing the custom layers are be added to the code generator that is not deployed, therefore the deployable code will remain lean and easy to maintain.

Example: Detectron2 is a popular library of advanced object detection models developed and trained in PyTorch. Detectron2 models can be exported in a vendor-agnostic ONNX format; however, this ONNX representation uses around 10 custom layer types natively available in Caffe2 only. These layers are not accepted by OpenVINO Model Optimizer. We have designed simple Model Optimizer extensions for these layers and thus could successfully produce the OpenVINO intermediate representation for selected Detectron2 models. Then the bulk of functionality supporting the custom layers has been implemented in the off-line Arhat engine. The automatically generated deployable code required only moderate extensions of Arhat runtime libraries.

Reliable benchmarking on various target platforms

Comparison of the computational performance of deep learning models on various target platforms is of a high interest for hardware vendors and end users. However, due to a high fragmentation of a deep learning software landscape, reliable performance comparison is challenging. To get meaningful and fair results, it is critical to use (1) the same original model for all target platforms and (2) the best available library of deep learning primitives on each platform. Furthermore, the benchmarking methodology must support the wide range of deep learning models and be accessible to the common software engineers not having expert knowledge of various deep learning frameworks. Arhat satisfies these requirements providing a simple unified framework for generation of highly efficient and easily deployable code on all supported platforms.

Example: We have conducted a case study comparing GPU inference performance of Intel Tiger Lake i7-1185G7E and NVIDIA Jetson Xavier NX systems for the embedded object detection tasks. For this purpose, we selected a representative set of object detection models from OpenVINO Model Zoo (8 models of SSD, Faster R-CNN, and YOLO families) and used Arhat to generate executable code the most efficient vendor libraries for both systems (oneDNN and TensorRT inference library respectively). This approach facilitated performance comparison on a fair basis. The benchmarking results revealed that Tiger Lake is a strong competitor in the given problem domain.