CRAMming a CPU Into RAM
CRAM enables in-memory processing to reduce data transfer bottlenecks and energy consumption while speeding up machine learning workloads.
While computing technologies have advanced tremendously over the past several decades, the same basic architecture — called the von Neumann architecture — that was in place near the beginning of the digital computer revolution is still used in most computer systems today. Computers that implement a von Neumann architecture have separate hardware units that handle processing and memory functions, with a shared bus that connects them. The longevity of this design proves that it has served us very well over the years, but as we push further into the bleeding edge of machine learning, it is beginning to show its age.
The problem is that as processors and memory units get faster and faster, and ever more data needs to be shuttled between them, the connecting data bus becomes a major bottleneck. This significantly slows data processing, and it also significantly contributes to the massive power consumption that is associated with training large machine learning models. A group headed up by researchers at the University of Minnesota realized that the best way to deal with this situation would be to combine processing and memory into the same hardware unit, such that data does not need to be continually transferred between them.
This is not a new idea, but it has taken technology quite some time to mature to the point that a practical implementation of this architecture could be created. Members of this team pioneered the development of Magnetic Tunnel Junction (MTJ) technologies, which are used in some storage and sensing hardware today. MTJs can operate at higher speeds than the transistors that conventionally power these devices, and they also consume far less energy.
In their latest research, the team leveraged this past work to develop what they call computational random-access memory (CRAM). As the name suggests, CRAM is capable of acting as both memory and a processor, all in one unit.
In order for MTJs to be used for more than just data storage, the CRAM architecture required some additional components. To support in-memory logic operations, the memory cells incorporate additional transistors and logic lines. A typical CRAM cell has a 2T1M (2 transistors, 1 MTJ) configuration. This setup includes a second transistor, a logic line, and a logic bit line in addition to the standard 1T1M (1 transistor, 1 MTJ) configuration used for memory operations.
During logic operations, specific transistors and lines are manipulated so that multiple MTJs can temporarily connect to a shared logic line. Voltage pulses are applied to the lines connecting the input MTJs, while the output MTJ is grounded. The resistance of the input MTJs affects the current flowing through the output MTJ, determining its state change. This process utilizes the voltage-controlled logic principle, where the logic operation is performed based on the thresholding effect and tunneling magnetoresistance effect of the MTJs. This arrangement makes it possible to reconfigure logical operations as needed.
Experiments showed that CRAM was capable of reducing energy consumption by a factor of more than 1,000 when compared with existing technologies. That could be good news for neural networks, image processing, edge computing, bioinformatics, signal processing, and other applications that this in-memory computation system is especially well-suited for.
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.