NVIDIA, Intel, Arm Release High-Performance FP8 Format for Interoperable Deep Learning Work

New formats, designed to make it easier to move networks between hardware, offer big performance gains with equivalent accuracy.

Gareth Halfacree
18 days ago β€’ Machine Learning & AI

NVIDIA, Intel, and Arm have jointly announced the release of FP8, an eight-bit floating point format specification designed to ease the sharing of deep learning networks between hardware platforms.

"The industry has moved from 32-bit precisions to 16-bit, and now even 8-bit precision formats," claims NVIDIA's Shar Narasimhan of the move. "Transformer networks, which are one of the most important innovations in AI, benefit from an 8-bit floating point precision in particular. We believe that having a common interchange format will enable rapid advancements and the interoperability of both hardware and software platforms to advance computing."

As a result, NVIDIA has teamed up with Intel and Arm on FP8, a specification that details two eight-bit floating point formats for interchange between hardware platforms: E5M2 and E4M3. "FP8 minimizes deviations from existing IEEE 754 floating point formats," Narasimhan claims, "with a good balance between hardware and software to leverage existing implementations, accelerate adoption, and improve developer productivity."

E5M2 is a truncated IEEE FP16 format, which uses five bits for the exponent and two bits for the mantissa; E4M3 makes what Narasimhan describes as "a few adjustments" and uses a four-bit exponent and a three-bit mantissa. In either case, they're eight-bit formats suitable for training and inference β€” and, their creators promise, can reduce computational workloads over higher-precision alternatives.

"Testing the proposed FP8 format shows comparable accuracy to 16-bit precisions across a wide array of use cases, architectures, and networks," claims Narasimhan. "Results on transformers, computer vision, and GAN networks all show that FP8 training accuracy is similar to 16-bit precisions while delivering significant speed-ups."

The companies involved have opted to release FP8 under open, license-free terms β€” and will, in the near future, submit the specification to IEEE for consideration as a formal standard. The hope is for wide industry adoption: NVIDIA already includes FP8 support in its Hopper GPU architecture, claiming in internal testing that it can deliver a 4.5x speed-up for the BERT high-accuracy model as measured by the MLPerf Inference v2.1 benchmark.

Full details are available in the FP8 paper, available on Cornell's arXiv preprint server under open-access terms.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles