Stacked for Success
By stacking processing units on top of layers of memory in three dimensions, data transfer speeds were increased while reducing energy use.
The bottleneck that exists in transferring data between processing units, such as CPUs and GPUs, and memory in computing systems is a crucial challenge in modern computer architectures. This bottleneck refers to the limitation in the speed and capacity of data transfer between the processing units and the memory subsystem, which can impede overall system performance and efficiency.
One of the primary problems caused by this bottleneck is memory latency. CPUs and GPUs are capable of executing instructions at high speeds, but the time it takes to retrieve data from memory is relatively slow. This latency leads to idle cycles for the processing units, as they have to wait for data to be fetched from memory before they can continue executing instructions. This can result in decreased throughput and slower overall system performance.
This is especially impactful for applications that require large amounts of data processing, such as artificial intelligence (AI). These applications often involve complex algorithms that manipulate vast datasets, necessitating frequent data transfers between memory and processing units. The limited bandwidth between the two can significantly hinder the performance of AI algorithms, leading to slower training times and less efficient inferences.
But improving bandwidth between processing and memory units is challenging β the primary options involve either adding more wires or increasing data transfer rates. But with components usually being laid out in two dimensions, adding more wires quickly becomes impractical. Increasing transfer rates, on the other hand, leads to an increase in energy use, which is already a significant concern in large computing systems.
Another option is on the horizon, however, thanks to the recent work done by a team at the Tokyo Institute of Technology. They have developed a hardware platform called BBCube 3D that consists of a three-dimensional stack of processing and memory units. Not only has this technology been shown to be capable of faster data transfers than any existing systems, but it is also highly energy efficient.
Short for Bumpless Build Cube 3D, the key to BBCube 3D is a novel architecture in which a processing unit sits on top of multiple layers of DRAM memory. Wires run between the processing unit and memory to make the connections, and pass between layers with the help of through-silicon vias. By adding a third dimension, the wires between units are shorter, which reduces transfer times, and also makes for lower resistance and a reduction in parasitic capacitance.
To further improve performance of the chip, the researchers developed a system that would ensure that nearby data lines would never change values simultaneously. Keeping them out of phase in this way reduces crosstalk noise and makes BBCube 3D more robust in general.
The technology was evaluated to see how well it stacked up against state of the art memory technologies like DDR5 and HBM2E. A bandwidth of 1.6 terabytes per second was achieved with BBCube 3D, which is a thirtyfold improvement over existing memory technologies. Considerable energy is also saved using the new technology, with reductions from 1/5th to 1/20th of DDR5 and HBM2E being observed in the experiments.
Should the BBCube 3D technology be developed to maturity, it could have a profound impact on applications ranging from machine learning and molecular simulations to climate prediction and biological research.