IBM has officially unveiled an artificial intelligence (AI) accelerator, built on a 7nm process, which is says is "at the vanguard of low-precision training and inference"— giving it "unparalleled energy efficiency."
"In a new paper presented at the 2021 International Solid-State Circuits Virtual Conference (ISSCC), our team details the world’s first energy efficient AI chip at the vanguard of low precision training and inference built with 7nm technology," say researchers Ankur Agrawal and Kailash Gopalakrishnan. "Through its novel design, the AI hardware accelerator chip supports a variety of model types while achieving leading edge power efficiency on all of them."
Figures shared by the team bear the claim out: Compared to NVIDIA's A100 processor, which achieves 0.78 TFLOPS per Watt at FP16 and 3.12 TOPS per Watt at INT4, IBM's prototype processor offers up to 3.5 TFLOPS/W and 16.5 TOPS/W — though the floating-point performance figure is taken at FP8 precision, rather than the FP16 of NVIDIA's figures.
Lower precision, in fact, is one of the keys to IBM's breakthrough — and without a loss of accuracy in the resulting model. "It’s the first silicon chip ever to incorporate ultra-low precision hybrid FP8 (HFP8) formats for training deep learning models in a state-of-the-art silicon technology node (7nm EUV [Extreme Ultraviolet lithography]-based chip)," the researchers explain. "Also, the raw power efficiency numbers are state of the art across all different precisions.
"It’s one of the first chips to incorporate power management in AI hardware accelerators. In this research, we show that we can maximize the performance of the chip within its total power budget, by slowing it down during computation phases with high power consumption."
The part also offers leading utilisation figures, the researchers claim: The prototype chips showed over 80 percent utilisation during training and over 60 percent during inference, a considerable gain over the 30 percent typical of GPU-based accelerators.
"Our new AI core and chip can be used for many new cloud to edge applications across multiple industries," the researchers claim. "For instance, they can be used for cloud training of large-scale deep learning models in vision, speech and natural language processing using 8-bit formats (vs. the 16- and 32-bit formats currently used in the industry). They can also be used for cloud inference applications, such as for speech to text AI services, text to speech AI services, NLP services, financial transaction fraud detection and broader deployment of AI models in financial services."
Thus far, however, IBM has not offered a timescale for commercialisation of the research. The team's paper, A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling, was presented at the 2021 International Solid-State Circuits Conference (ISSCC '21), but has not yet been made publicly available.
More information on the team's work is available on the IBM website.