NVIDIA, Intel, Arm Release High-Performance FP8 Format for Interoperable Deep Learning Work

New formats, designed to make it easier to move networks between hardware, offer big performance gains with equivalent accuracy.

Gareth Halfacree
2 years ago β€’ Machine Learning & AI

NVIDIA, Intel, and Arm have jointly announced the release of FP8, an eight-bit floating point format specification designed to ease the sharing of deep learning networks between hardware platforms.

"The industry has moved from 32-bit precisions to 16-bit, and now even 8-bit precision formats," claims NVIDIA's Shar Narasimhan of the move. "Transformer networks, which are one of the most important innovations in AI, benefit from an 8-bit floating point precision in particular. We believe that having a common interchange format will enable rapid advancements and the interoperability of both hardware and software platforms to advance computing."

As a result, NVIDIA has teamed up with Intel and Arm on FP8, a specification that details two eight-bit floating point formats for interchange between hardware platforms: E5M2 and E4M3. "FP8 minimizes deviations from existing IEEE 754 floating point formats," Narasimhan claims, "with a good balance between hardware and software to leverage existing implementations, accelerate adoption, and improve developer productivity."

E5M2 is a truncated IEEE FP16 format, which uses five bits for the exponent and two bits for the mantissa; E4M3 makes what Narasimhan describes as "a few adjustments" and uses a four-bit exponent and a three-bit mantissa. In either case, they're eight-bit formats suitable for training and inference β€” and, their creators promise, can reduce computational workloads over higher-precision alternatives.

"Testing the proposed FP8 format shows comparable accuracy to 16-bit precisions across a wide array of use cases, architectures, and networks," claims Narasimhan. "Results on transformers, computer vision, and GAN networks all show that FP8 training accuracy is similar to 16-bit precisions while delivering significant speed-ups."

The companies involved have opted to release FP8 under open, license-free terms β€” and will, in the near future, submit the specification to IEEE for consideration as a formal standard. The hope is for wide industry adoption: NVIDIA already includes FP8 support in its Hopper GPU architecture, claiming in internal testing that it can deliver a 4.5x speed-up for the BERT high-accuracy model as measured by the MLPerf Inference v2.1 benchmark.

Full details are available in the FP8 paper, available on Cornell's arXiv preprint server under open-access terms.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles