In order to benchmark the processors of the two Cora Z7 variants, only one of which is dual-core, I decided to boot them with a Linux image from an SD card, and run a pre-compiled Go programming language executable. This Go app would try to estimate the value of Pi by checking a multitude of random points within a square, then test to see if they are within an inscribed quarter circle. Pi is derived from the number of points within the circle, divided by the total number of sample points. This method, referred to as the Monte Carlo method, is ideal for parallelization, as the result of the test on any particular sample point does not rely on the result of the test on any other sample. The Go app first estimates Pi using a single-threaded method, then does it again using a multi-threaded method.
Digilent provides PetaLinux images for some of our boards, including the two variants of the Cora board, the CoraZ7-10 and the CoraZ7-07S. Instructions on how to use these projects are available in the links above. I needed to use an VirtualBox VM running Ubuntu 16.04.3, with Vivado 2017.4, the PetaLinux Tools, and Golang installed. Golang can be installed through apt-get, but the other tools need to be downloaded through Xilinx's website.
I only needed to connect the Cora manually mount the microSD in the Petalinux console in order to transfer files from my computer (running the root filesystem in the Cora's RAM). The speed of the UART connection to my computer from the Cora was also not a concern, so I did not need to use an Ethernet connection. However, further information is available in the READMEs of the Petalinux repos (Z7-10, Z7-07S) on how to install Petalinux tools, connect a PetaLinux-programmed board over Ethernet and run the root filesystem from microSD.
Using Ubuntu's built-in "Disks" tool, I formatted a microSD card such that the first partition was >500MB FAT and the second was >20MB EXT4 (actually 3.5GB, the remainder of the SD card's storage space).
I created a Go source file, called "main.go" (source code is provided below), and built an app that could run on either Cora variant using the "GOARCH=arm go build" command in the Linux terminal. I then copied the output executable file to the second partition of my microSD card.
I then downloaded each of the release BSPs for each of the Coras, available through their Github repositories (linked above). For each Cora, I created a PetaLinux project using the instructions in it's repo's README. I then copied the pre-built image.ub and BOOT.BIN files for the Cora Z7-07S to the first partition of my microSD card. I plugged the SD card into the Cora Z7-07S and set the Cora's jumpers so that it received power over USB and booted from the SD card. I plugged in the Cora, and let PetaLinux boot, observing from a Tera Term serial terminal connected to the Cora's USB-UART port. Once the Cora had booted, I mounted the second partition of the microSD card (which was located at "/dev/mmcblk0p2" in the Petalinux filesystem) and ran the Go app, passing the command line parameter "100000" for the number of samples to test. I repeated the same process for the Cora Z7-10. The images below are screenshots of the Tera Term terminal after the Go app was run on each Cora.
In the multi-threaded trial, I expected the dual-core CPU of the Cora Z7-10 to require approximately half of the time it takes to process samples in the single-threaded trial.
I had expected the single-core CPU of the Cora Z7-07S to require about the same amount of time across both of it's trials. Perhaps requiring slightly more time for the multi-threaded trial, given that managing several threads needs some additional overhead.
Both the Cora Z7-10 and Z7-07S were expected to take the same amount of time on the single-threaded trial, as the CPUs of both boards' Zynq chips have the same specs.
The table below shows how long each of the Cora variants took to process 100, 000 Monte Carlo samples.
This worked pretty well, and I was surprised how easy Petalinux was to set up.
A coworker of mine, Arvin Tang, created another project where he used the same algorithm in a bare-metal FPGA design, which can process 9 of the Monte Carlo samples per clock cycle (one clock cycle per 8 nanoseconds). As expected, his results were substantially better than mine, processing 500, 000, 000 samples in less than a second. (His project is yet to be published)