When it comes to getting the best performance from our programmable logic designs. The closer we store our data the better performance we achieve in the overall system.
This is one of the reasons Xilinx programmable logic devices contain a diverse range of memory structures by providing Distributed, Block and Ultra RAM.
Of course, you cannot get much closer to the implementation than Distributed RAM. Distributed RAM sits right within the SliceM and enables up to 256 bits of storage per SliceM (64 bits per LUT) in the UltraScale+ architecture.
When it comes to selecting Distributed or Block RAM, there are a few considerations we should follow which help guide us.
- Storing 64 or fewer bits, Distributed RAM should be used.
- Storing between 64 and 128 bits if the data width is 16 or fewer Distributed RAM should be used.
- For larger than 128 bits Block RAM should be used.
The best way to ensure the optimal memory structure is used, is to infer the memory structure within our HDL. This enables the implementation tool to select the most appropriate RAM strucutre.
But for most applications, we want to store an increased size of data which is where Block RAM comes in.
In UltraScale+ devices, Block RAM (BRAM) are dedicated 36Kb blocks that are extremely flexible. Each BRAM provides two read and write ports and be implemented as either a 36Kb memory or two 18Kb memory.
With BRAM, we have ability to cascade to make larger memories and they for the corner stone on much data storage within our designs. This memory storage my be temporary as part of an algorithm implementation alternatively it maybe held in the longer term configuring parameters of the system or algorithm.
If the data size required is too large to store internally within BRAM, we use an external memory device such as DDR or QDR.
However, UltraScale+ devices in addition to Distributed and Block RAM also provide UltraRAM (URAM).
UltraRAM is intended to allow the replacement of off-board memories enabling better overall performance.
The additional memory provided by UltraRAM ranges from 13.5 Mb in the Zynq MPSoC ZU4EG all the way up to 360Mb in the Virtex UltraScale+ VU13P; although in the Virtex UltraScale+ VU13P the maximum configurable individual memory size is 100Mb.
Unlike BRAM, each URAM is dual port and fixed size of 4k by 72 bits, and has a single synchronous clock for both ports, and both ports are completely independent.
As both ports are independent, there is the possibility for write collisions, as such the behavior of the UltraRAM is different depending upon which port performs the read or write access. One key concept to remember with UltraRAM is Port A always completes its operation first this means:
- Both Port A and Port B write to the same address, Port B overwrites the data Port A writes.
- Port A reads the same address as Port B writes, Port A outputs the old data prior to the Port B write.
- Port A writes the same address a Port B reads, Port B outputs the value present on Port A.
In UltraScale devices, resources are organized in columns. As such creating deeper structures within URAM can be achieved without the use any fabric resources when the memory can be created within one column.
If the memory requires more than one column worth of URAM, fabric resources are used.
When it comes to implementing URAM memories within our programmable logic designs, we can implement the URAM structures either using the Block Memory Generator and selecting URAM.
Alternatively, we can use the template provided in Vivado's editor to instantiate the URAM modules as required.
Needless to say, not all of our devices contain URAM; however, if they do, it is resource which we should try to maximize its use when possible!
See My FPGA / SoC Projects: Adam Taylor on Hackster.io
Get the Code: ATaylorCEngFIET (Adam Taylor)
Access the MicroZed Chronicles Archives with over 300 articles on the FPGA / Zynq / Zynq MpSoC updated weekly at MicroZed Chronicles.