LIBXSMM

LIBXSMM

Hans Pabst

Hans Pabst

Zürich, Zurich

Library for small, dense or sparse matrix multiplications for today's HPC applications, and small convolutions as used in Machine Learning.

Modern Code, Artificial Intelligence

  • 0 Collaborators

  • 1 Followers

    Follow

Description

LIBXSMM is a library for small dense and small sparse matrix-matrix multiplications as well as for deep learning primitives such as small convolutions targeting Intel Architecture. The highly optimized code leverages an innovative Just-In-Time in-memory code generation, which is based on a machine model rather than blind auto-tuning.

SMM API: Small dense or sparse matrix multiplications are an important building block of many applications that not only rely on arithmetically intense phases, but demand to effectively utilize the available memory bandwidth. LIBXSMM's small, dense or sparse matrix multiplication domain belongs to its mature part of the API. Applications and references:

[1] https://cp2k.org/: Open Source Molecular Dynamics with its DBCSR component processing batches of small matrix multiplications ("matrix stacks") out of a problem-specific distributed block-sparse matrix. Starting with CP2K 3.0, LIBXSMM can be used to substitute CP2K's 'libsmm' library. Prior to CP2K 3.0, only the Intel-branch of CP2K was integrating LIBXSMM (see https://github.com/hfp/libxsmm/raw/master/documentation/cp2k.pdf).

[2] https://github.com/SeisSol/SeisSol/: SeisSol is one of the leading codes for earthquake scenarios, for simulating dynamic rupture processes. LIBXSMM provides highly optimized assembly kernels which form the computational back-bone of SeisSol (see https://github.com/TUM-I5/seissol_kernels/).

[3] https://github.com/Nek5000/NekBox: NekBox is a version of the highly scalable and portable spectral element Nek5000 code which is specialized for box geometries, and intended for prototyping new methods as well as leveraging FORTRAN beyond the FORTRAN 77 standard. LIBXSMM provides optimized kernels aiming to conveniently substitute the MXM_STD code.

[4] https://github.com/Nek5000/Nek5000: Nek5000 is the open-source, highly-scalable, always-portable spectral element code from https://nek5000.mcs.anl.gov/. The development branch of the Nek5000 code now incorporates LIBXSMM.

DNN API: In the last years, new workloads such as deep learning and more specifically Convolutional Neural Networks (CNNs) emerged, and are pushing the limits of today's hardware. One of the expensive kernels is a small convolution with certain kernel sizes (3, 5, or 7, etc.). LIBXSMM provides a set of low-level primitives to accelerate machine learning (ML). A prominent framework using LIBXSMM is Google's TensorFlow™. Prominent applications include, but are not limited to image processing, speech recognition, or face detection. LIBXSMM's DNN API aims for an easy to use but comprehensive set of low-level primitives, which are common for CNN training and classification. Applications and references:

[1] TensorFlow: TensorFlow™ is an open source software library for numerical computation using data flow graphs. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team for the purposes of conducting machine learning and deep neural networks research. LIBXSMM can be used to increase the performance of TensorFlow on Intel hardware.

[2] https://software.intel.com/en-us/articles/intel-xeon-phi-delivers-competitive-performance-for-deep-learning-and-getting-better-fast: Intel Xeon Phi Delivers Competitive Performance For Deep Learning - And Getting Better Fast. Article mentioning LIBXSMM's performance of convolution kernels with DeepBench. Intel Corporation, 2016.

Links

LIBXSMM: Library targeting Intel Architecture (x86) for small, dense or sparse matrix multiplications, and small convolutions.

Default user avatar 57012e2942

Hans P. created project LIBXSMM

Medium 0cfaf25b 04eb 4d66 9f84 ca7f6c316fef

LIBXSMM

LIBXSMM is a library for small dense and small sparse matrix-matrix multiplications as well as for deep learning primitives such as small convolutions targeting Intel Architecture. The highly optimized code leverages an innovative Just-In-Time in-memory code generation, which is based on a machine model rather than blind auto-tuning.

SMM API: Small dense or sparse matrix multiplications are an important building block of many applications that not only rely on arithmetically intense phases, but demand to effectively utilize the available memory bandwidth. LIBXSMM's small, dense or sparse matrix multiplication domain belongs to its mature part of the API. Applications and references:

[1] https://cp2k.org/: Open Source Molecular Dynamics with its DBCSR component processing batches of small matrix multiplications ("matrix stacks") out of a problem-specific distributed block-sparse matrix. Starting with CP2K 3.0, LIBXSMM can be used to substitute CP2K's 'libsmm' library. Prior to CP2K 3.0, only the Intel-branch of CP2K was integrating LIBXSMM (see https://github.com/hfp/libxsmm/raw/master/documentation/cp2k.pdf).

[2] https://github.com/SeisSol/SeisSol/: SeisSol is one of the leading codes for earthquake scenarios, for simulating dynamic rupture processes. LIBXSMM provides highly optimized assembly kernels which form the computational back-bone of SeisSol (see https://github.com/TUM-I5/seissol_kernels/).

[3] https://github.com/Nek5000/NekBox: NekBox is a version of the highly scalable and portable spectral element Nek5000 code which is specialized for box geometries, and intended for prototyping new methods as well as leveraging FORTRAN beyond the FORTRAN 77 standard. LIBXSMM provides optimized kernels aiming to conveniently substitute the MXM_STD code.

[4] https://github.com/Nek5000/Nek5000: Nek5000 is the open-source, highly-scalable, always-portable spectral element code from https://nek5000.mcs.anl.gov/. The development branch of the Nek5000 code now incorporates LIBXSMM.

DNN API: In the last years, new workloads such as deep learning and more specifically Convolutional Neural Networks (CNNs) emerged, and are pushing the limits of today's hardware. One of the expensive kernels is a small convolution with certain kernel sizes (3, 5, or 7, etc.). LIBXSMM provides a set of low-level primitives to accelerate machine learning (ML). A prominent framework using LIBXSMM is Google's TensorFlow™. Prominent applications include, but are not limited to image processing, speech recognition, or face detection. LIBXSMM's DNN API aims for an easy to use but comprehensive set of low-level primitives, which are common for CNN training and classification. Applications and references:

[1] TensorFlow: TensorFlow™ is an open source software library for numerical computation using data flow graphs. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team for the purposes of conducting machine learning and deep neural networks research. LIBXSMM can be used to increase the performance of TensorFlow on Intel hardware.

[2] https://software.intel.com/en-us/articles/intel-xeon-phi-delivers-competitive-performance-for-deep-learning-and-getting-better-fast: Intel Xeon Phi Delivers Competitive Performance For Deep Learning - And Getting Better Fast. Article mentioning LIBXSMM's performance of convolution kernels with DeepBench. Intel Corporation, 2016.

No users to show at the moment.

Bigger eric headshot 2012
  • Projects 0
  • Followers 4

Eric Heaton

Intel Data Plane Architect, focusing on the Cable market. Use IA to create high-performance hw or sw-based networking infrastructure.

Oakland, CA, USA