Novel BLADE Architecture Could Boost IoT Devices' Compute Performance, Energy Efficiency Sixfold

Designed to integrate into cache memory and perform computation locally, BLADE offers some impressive gains for edge devices.

Scientists at the Swiss Federal Institute of Technology have released details of a new computing architecture, dubbed BLADE, which they say can offer up to a sixfold performance increase and reduction in energy used compared to current NEON-accelerated Arm processors in edge devices: BLADE.

Edge computing is the hot new technology: Doing away with the time it takes to shuffle data from a remote device to a central system, process it, and return the result by performing the computation on the device itself. For low-power IoT systems, though, there's a balancing act between the computational needs of the task and keeping the system small, cool, and low-energy.

That's where BLADE, the BitLine Accelerator for Devices on the Edge, comes in. "BLADE is an in-SRAM computing architecture that utilizes local wordline groups to perform computations at a frequency 2.8x higher than state-of-the-art in-SRAM computing architectures," the team explains of its work. "BLADE is integrated into the cache hierarchy of low-voltage edge devices, and simulated and benchmarked at the transistor, architecture, and software abstraction levels."

"BLADE is an arithmetic iSC architecture whose utilization of industry standard 6T bitcell arrays enables easy integration into current SRAM fabrication flows, and its low power digital design makes it appropriate for accelerating emerging applications on edge devices. We validated BLADE’s functionality from the system level down to the electrical level."

"At the system level, we integrated BLADE into the cache hierarchy of an in-order CPU, accounting for system level interactions such as coherency and load/store consistency. Then, at the electrical level, we laid out our enhanced cache design, demonstrating how the use of local bitlines provides the best voltage/frequency ratio (0.6V/415MHz-1V/2.2GHz) of any 6T iSC architecture while maintaining a low area overhead of 8%."

The results claimed in the paper are impressive: Taking three common edge device workloads, the team found that BLADE offered a fourfold performance and sixfold energy reduction gain for cryptographic operations, a sixfold performance and twofold energy reduction gain for video encoding, and a threefold performance and 1.5-fold energy reduction gain for convolutional neural network (CNN) workloads.

The team has not yet disclosed plans to commercialise the technology, while the paper itself has been published under closed-access terms in the journal IEEE Transactions on Computers.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire:
Related articles
Sponsored articles
Related articles