PYNQ-Z2 is a development board based on the Xilinx Zynq-7000 SoC, which features a Cortex-A9 processor (PS) and Artix-7 programmable logic (PL). Each unit is known to be suitable for handling different types of tasks. For example, programmable logic is well-suited for parallel processing and repetitive operations, while the processor is more general-purpose and flexible.
In the final semester of my bachelor’s degree at National Taipei Univerity of Technology, I was glad to have the opportunity to work with the PYNQ-Z2 board, since I had no prior experience with a Xilinx product before — especially with one that features a heterogeneous architecture. So, I decided to launch a cool project to gain knowledge of the ecosystem.
The first question I asked myself was: which topic could take advantage of PL and PS? After a while, I eventually came up with the idea of building an Music Visualizer. The plan was to utilize the PL to compute the Fast Fourier Transform and drive the addressable LEDs, while the PS would handle audio streaming over Ethernet and compute the visual effects. With this concept in mind, I then started the design work.
System OverviewThe image shown above is the final architecture of the system which can be divided into two parts, PYNQ-Z2 and PC. PYNQ-Z2 handles most of the computation and drives the LED panel. In contrast, PC only captures audio samples and streams them via Ethernet.
We also further split PYNQ-Z2 into two sections: PS and PL. The PS receives audio samples from Ethernet, analyzes the audio spectrum, and determines how the LED panel looks. Meanwhile, the PL accelerates FFT computation, converts the LED panel's raw RGB data into sequence of signals that addressable LEDs accept, handles the I/O components user can interact with. These two sections communicate via general-purpose and high-performance AXI buses (AXI GP and AXI HP).
PreparationFlashing and Booting the OS
Please follow the instructions of official document to flash and boot up the PYNQ-Z2 OS. After that, opening a Jupyter Notebook on the web broswer shouldn't be a problem.
Finding the Address
If you're not sure whether PYNQ-Z2 is connected to local network, Angry IP Scanner is a handy tool for discovering devices. Keep in mind that the PYNQ-Z2 can take some time to boot up!
When PYNQ-Z2 is connected directly to your PC, its default IP address is 192.168.2.99
. In my case, since I connected it to a router, the assigned IP address is 192.168.1.37
.
Accessing PYNQ-Z2 through MobaXterm
To interact with PYNQ-Z2 remotely, we use MobaXterm to establish the SSH connection, access the board's terminal, configure settings, and run scripts. Additionally, it supports SFTP, allowing easy file transfer through drag-and-drop, this is the most important feature we need.
Accessing PYNQ-Z2 through Google Chrome
Another way to access the board's terminal is by using the Secure Shell Extension, provided by Google for Google Chrome. This method is highly recommended when file transfer is not required.
Type ssh
in the URL bar and press Tab
to activate the extenstion. Then, input the username and IP address to establish the connection.
Enabling Root Login
Normally, we log into the PYNQ-Z2 board using the default user xilinx
. However, at the end of the project, we needed to execute a .py
script that includes downloading the PL bitstream. Without root privileges, the script fails to download the bitstream.
By default, PYNQ-Z2 does not allow root login over SSH, so it must be enabled manually.
1. Set a root password
$ sudo passwd root
2. Edit SSH config
$ sudo nano /etc/ssh/sshd_config
Find this line:
#PermitRootLogin prohibit-password
Uncomment and change it to:
PermitRootLogin yes
Save and exit.
3. Restart SSH service
$ sudo service ssh restart
Installing Virtual Cable
To capture the audio output from the laptop, one traditional method is to use a 3.5mm audio cable connect the audio output jack back to the input jack. However, it's not practical. That's where Virtual Cable comes in. As the name indicates, it replaces the physical 3.5mm audio cable with a virtual one, allowing us to capture audio output from the PC more conveniently.
Download and install the latest Virtual Cable from the official website. After installation, two new audio output devices will appear in the system: CABLE Input and CABLE In 16ch. Select CABLE Input when you need to capture the laptop's audio.
You might notice that sound from your laptop’s speakers is missing after selecting CABLE Input. Don't worry, this is normal. Follow the steps below to ensure the speakers keep playing while CABLE Input is selected.
1. Open Sound Setting
Press Windows + R
and enter mmsys.cpl
to open sound setting.
2. Enable "Listen to This Device"
Switch to the “Recording” tab, double-click the “CABLE Output, ” and then open the “Listen” tab.
Check the box labeled “Listen to this device, ” and make sure to select your desired playback device (e.g., Speakers) from the dropdown menu.
Other Environment Setup
1. For editing and running scripts, Visual Studio Code is recommended, but any editor or terminal will work.
2. Make sure you have Python installed on your PC and it’s added to your system’s PATH. Otherwise, you won’t be able to run scripts from the terminal.
3. To modify or rebuild the PL design, you’ll need to install the Vivado Design Suite. In addition, you’ll also need to learn how to create a block diagram, generate a design wrapper, and build the bitstream. Installing the Vivado Design Suite is not required if you won’t make any changes to the PL design.
Core TasksWhen building a large system, one step at a time is important. Thus, We divided the project into multiple tasks and tackled them individually. Each task was verified independently before being integrated into the final system.
The core task include:
1. Implement Fast Fourier Transform in the PL2. Assemble addressable RGB LED panel3. Build a custom cable with level shifter4. Implement addressable RGB LED driver in the PL5. Stream audio from PC to PYNQ-Z2
Implement Fast Fourier Transform in the PLFast Fourier Transform is used to analyze the spectrum of the audio samples. Although we can perform FFT using NumPy library in the PS, it is time-consuming, especially for high-resolution FFT, which can further impact the refresh rate of LED panel. Therefore, we let the PL handle the FFT computation to fully leverage the power of parallel processing.
To implement FFT in the PL, we referred to a YouTube tutorial by FPGAPS, which demonstrates the basic integration of the FFT IP with AXI DMA. Based on the block design shown in the video, we extended it by integrating a magnitude calculation stage, since we're only interested in the amplitude of the FFT results.
The image below shows our FFT architecture.
The image below is a closer look at the FFT section of our Vivado block design, where you can see how the FFT IP connects with other components. The highlighted orange wires are the main data processing path.
Originally, the PS would receive both the real and imaginary components from the FFT IP. With our modification, the magnitude is computed in the PL before being passed to the PS, which significantly reduces the processing load and improves performance.
Controlling FFT IP using Python
After completing the block design and generating the bitstream, we can create a new Jupyter notebook to control the hardware FFT using python by following steps.
1. Load Overlay and initialize IP
We first import the necessary libraries and load the bitstream file into the PL.
from pynq import Overlay, allocate
import matplotlib.pyplot as plt
import numpy as np
overlay = Overlay("design_1_wrapper.bit", download=True)
dma = overlay.axi_dma_0
print("Done!")
2. Generate Test Signal
fft_length = 4096
fs = 4096
x = np.arange(fft_length) / fs
y = np.zeros(fft_length, dtype = np.int16)
y[fft_length // 2 - 200: fft_length // 2 + 200] = 1000
3. Reserve Memory Space for DMA
We allocate TX and RX buffers for the DMA transfer. The buffers length must match the FFT size defined in the Vivado block design which is 4096 here. Since we already perform magnitude calculation in the PL, the RX buffer should use float32
. The TX buffer, on the other hand, uses uint32
to match the input format required by the FFT IP.
dma_tx_buffer = allocate(shape = (fft_length,), dtype = np.uint32)
dma_rx_buffer = allocate(shape = (fft_length,), dtype = np.float32)
print(f'DMA TX Buffer Address:{hex(dma_tx_buffer.physical_address)}')
print(f'DMA RX Buffer Address:{hex(dma_rx_buffer.physical_address)}')
4. Pack Data into Compatible Format
The FFT IP expects each input sample to be a 32-bit word, where the lower 16 bits represent the real part and the upper 16 bits represent the imaginary part. Since our input signal is purely real, we set the imaginary part to zero.
real_part = np.uint16(y)
imag_part = np.zeros(shape = (fft_length,), dtype = np.uint16)
for i in range(fft_length):
r = np.uint16(real_part[i])
j = np.uint16(imag_part[i])
dma_tx_buffer[i] = (j << 16) | r
5. Perform FFT Computation
Once the input data is packed into the DMA TX buffer, we can trigger the hardware FFT computation by performing a DMA transfer. After the transfer is complete, the result is copied into a NumPy array for further analysis.
xb_hardware = np.fft.fftfreq(fft_length, d = 1 / fs)
yb_hardware = np.zeros(fft_length, dtype=np.float32)
dma.recvchannel.transfer(dma_rx_buffer)
dma.sendchannel.transfer(dma_tx_buffer)
dma.sendchannel.wait()
dma.recvchannel.wait()
yb_hardware[:] = dma_rx_buffer
6. Plot FFT Result
We can plot the FFT result using Matplotlib library.
plt.close()
fig, axs = plt.subplots(1, 1, figsize=(15, 4))
axs.stem(np.fft.fftshift(xb_hardware), np.fft.fftshift(yb_hardware),
linefmt='orange', markerfmt='o', basefmt='gray',
label='Hardware FFT')
axs.set_title("Hardware FFT of Original Signal")
axs.set_ylabel("Magnitude")
axs.set_xlabel("Frequency(Hz)")
axs.set_xlim(-50, 50)
plt.show()
Another reason this project came together so smoothly was that I already had WS2812B addressable RGB LED strips and a white translucent acrylic sheet in stock from a project couple years ago. No need to buy additional parts! 😀
To build the LED panel, the following materials are needed:
1. WS2812B Addressable RGB LED Strip (144 LEDs / meter) — 2 meters
2. White Translucent Acrylic Sheet (14cm × 14cm × 0.3cm) — 2 pcs
3. Hook-Up Wires — some
4. MR30 Male Connector — 1 pc
5. Devcon 5 minute Epoxy — 1 set
6. Masking Tape — some
7. 3D Print Filament — some
First, cut the long LED strip into multiple shorter segments that fit within the acrylic sheet. Then, remove the adhesive backing and stick the strips onto the first acrylic sheet. After positioning, solder the hook-up wires and the MR30 male connector as shown in the image below.
Be careful! MR30 male connector is only attached to the panel through three solder pads on the LED strip at this moment. Any mechanical stress could potentially damage the solder pads.
After finishing the main part of the LED panel, we designed a middle frame in SolidWorks and 3D printed it. The main part and the second acrylic sheet are stacked and bonded together, forming a sandwich structure.
To drive the LED panel properly, a 5V logic-level signal is required. However, PYNQ-Z2 only supports up to 3.3V logic-level output, so a level shifter is necessary.
We built a custom cable to connect the PYNQ-Z2 and the LED panel, placing a TXS0108E level shifter module in between. The following materials are needed:
1. TXS0108E Level Shifter Module — 1 pc
2. Hook-Up Wires — some
3. MR30 Female Connector — 1 pc
4. DuPont 2x3 Header Housing — 1 pc
5. DuPont Female Crimp Terminals — 6 pcs
6. Heat Shrink Tubing — some
7. Solder Wire — some
The image below shows the wiring of the module. Don't forget to connect +3V3 line to the OE pin to enable the module! We also used heat shrink tubing and hot glue for extra mechanical protection.
The result is shown below — clean and neat.
Looking closely at the PYNQ-Z2 board, you’ll notice that we’re using the pins near the top-right HDMI connector (highlighted in yellow) to drive the LED panel. These pins provide all the necessary power and signal lines. To simplify wiring and reduce connection errors, we also grouped the four wires using a single 2×3 DuPont connector.
Background Knowledge
Addressable RGB LEDs have become increasingly popular worldwide due to their convenience — they only require three wires to use: power, ground, and signal. Even when using a large number of LEDs, they can simply be cascaded together. The image below shows a typical connection diagram of WS2812B addressable RGB LEDs.
To control the colors of addressable RGB LEDs through a single wire, a special NZR communication protocol is used. Each bit is represented by a 1.25us pulse with different high/low ratios to indicate either 1 or 0.
Each addressable RGB LED needs 24 bits to set its color. The first 8 bits control green channel; the next 8 bits control red channel; the last 8 bits control blue channel.
When N addressable RGB LEDs are connected in series, N * 24 Bits of data must be sent to control all of the LEDs. The first LED captures the first 24 bits and passes the remaining bits to the next LED, and so on. After all data is transmitted, a Reset Code (low signal) lasting more than 50uS must be sent to latch the color values.
Implementation
Controlling GPIO directly through the PS is possible using MIO (Multiplexed I/O) pins. However, MIO is not designed for high-speed or timing-critical applications, so it cannot generate the precise pulses required by addressable RGB LEDs. Therefore, just like the FFT, we hand this task over to the PL.
Big thanks to Adam Taylor, who published the article "PYNQ Controlled NeoPixel LED Cube." The images above show the architecture and the Vivado block design of our project, which is based on his work.
To set the colors of addressable RGB LEDs, we simply write the data into dual-port BRAM via MMIO from the PS. The Neopixel IP then automatically reads from the BRAM and generates the corresponding pulse stream.
The VHDL snippet below shows Neopixel IP, also designed by Adam Taylor, but slightly modified for our application. We increased the reset duration from 50us to 100us to ensure stable data transmission. Without this adjustment, the LEDs would blink unexpectedly or display incorrect colors.
-- SPDX-License-Identifier: GPL-3.0-or-later
-- File: neo_pixel.vhd
-- Based on: https://www.hackster.io/adam-taylor/pynq-controlled-neopixel-led-cube-92a1c1
-- Original Author: Adam Taylor
-- Modification by: William Lin
--
-- This file is part of a project that is licensed under the terms of the
-- GNU General Public License v3.0 or later. You should have received a copy
-- of the license with this file. If not, see <https://www.gnu.org/licenses/>.
LIBRARY IEEE;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;
ENTITY neo_pixel IS PORT(
clk : IN std_logic;
dout : OUT std_logic;
rstb : OUT STD_LOGIC;
enb : OUT STD_LOGIC;
web : OUT STD_LOGIC_VECTOR(3 DOWNTO 0);
addrb : OUT STD_LOGIC_VECTOR(31 DOWNTO 0);
dinb : OUT STD_LOGIC_VECTOR(31 DOWNTO 0);
doutb : IN STD_LOGIC_VECTOR(31 DOWNTO 0)
);
END ENTITY;
ARCHITECTURE rtl OF neo_pixel IS
TYPE FSM IS (idle,wait1,led,count,reset,addr_out,wait2,grab,wait_done,done_addr);
CONSTANT done : std_logic_vector(25 DOWNTO 0) := "00000000000000000000000001";
CONSTANT zero : std_logic_vector(24 DOWNTO 0) := "1111111000000000000000000";
CONSTANT one : std_logic_vector(24 DOWNTO 0) := "1111111111111100000000000";
CONSTANT numb_pixels : integer := 24;
CONSTANT reset_duration : integer := 2000; --number of clocks in the reset period, originally set to 1000
SIGNAL shift_reg : std_logic_vector(24 DOWNTO 0) := (OTHERS=>'0');
SIGNAL shift_dne : std_logic_vector(25 DOWNTO 0) := (OTHERS=>'0');
SIGNAL current_state : fsm := idle;
SIGNAL prev_state : fsm := idle;
SIGNAL load_shr : std_logic :='0';
SIGNAL pix_cnt : integer RANGE 0 TO 31 := 0;
SIGNAL rst_cnt : integer RANGE 0 TO 1023 := 0;
SIGNAL led_numb : integer RANGE 0 TO 1023;
SIGNAL ram_addr : integer RANGE 0 TO 1023:=0;
SIGNAL led_cnt : integer RANGE 0 TO 1023;
SIGNAL pixel : std_logic_vector(23 DOWNTO 0);
-- The remaining lines of code are omitted here.
Controlling Addressable RGB LEDs using Python
After completing the block design and generating the bitstream, we can create a new jupyter notebook to control addressable RGB LEDs using python by following steps.
1. Load Overlay and initialize IP
First, we import the necessary libraries and load the bitstream file into the PL.
from pynq import Overlay
from pynq import MMIO
overlay = Overlay("design_1_wrapper.bit", download = True)
# Base address can be found in the Vivado's Address Editor.
base_address = 0x40000000
# Memory size is determined by the BRAM IP's setting.
mem_size = 1024
mmio = MMIO(base_address, mem_size)
2. Set the Number of Addressable RGB LEDs to Drive
We then tell the Neopixel IP how many addressable RGB LEDs to drive by writing the number to address 0x0
of the BRAM. Once a non-zero value is written, the Neopixel IP will begin generating the corresponding number of pulses. Otherwise, it won't output any signal.
led_num = 209
address_offset = 0x0
mmio.write(address_offset, led_num)
3. Set the colors of Addressable RGB LEDs
After providing the addressable RGB LED count to the Neopixel IP. We can now assign the colors to each LED by writing data into specific memory locations. For example, writing to address 0x04
sets the color of the first LED, 0x08
sets the second LED, and so on.
# Color format: 0xGGRRBB
color = 0xff0000 # Pure green
# Memory address that controls the first addressable RGB LED
address_offset = 0x4
mmio.write(address_offset, color)
color = 0x00ff00 # Pure red
# Memory address that controls the second addressable RGB LED
address_offset = 0x8
mmio.write(address_offset, color)
Waveform Validation
The images below show the output waveform captured by a logic analyzer. It confirms that PYNQ-Z2 generates 5016 pulses before the 100 µs reset period begins. Since 209 addressable LEDs are connected in series and each addressable RGB LED requires 24 pulses to control, the expected number of pulses is 209 × 24 = 5016, which matches the result. The waveform also shows that refreshing the entire LED panel takes approximately 6.6 ms, meaning the maximum refresh rate is around 150 Hz under this configuration.
The images below show additional details of the waveform.
At the beginning of the project, capturing audio data posed a big question — nobody wants to connect an extra cable! To provide a seamless user experience, streaming audio over Ethernet stood out among the possible solutions. The image below shows the audio streaming architecture.
First, we captured loopback audio in PCM format using Virtual Cable. Then, we created a UDP client that periodically transfers chunks of audio samples to the PYNQ-Z2 over Ethernet. We chose the UDP protocol due to its low latency. The higher the transmission latency, the more noticeable the LED panel's visual lag becomes, making the audio and visuals go out of sync. The following code runs on PC.
import sounddevice as sd
import numpy as np
import socket
from datetime import datetime
SAMPLERATE = 48000
CHUNK = 960
DEVICE = 21
UDP_IP = "192.168.137.100"
UDP_PORT = 5005
print("Device List:")
print(sd.query_devices())
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
def callback(indata, frames, time_info, status):
audio = indata[:, 0]
rms = np.sqrt(np.mean(audio.astype(np.float32)**2))
sock.sendto(audio.tobytes(), (UDP_IP, UDP_PORT))
print(f"[{datetime.now().strftime('%H:%M:%S')}] Sent {audio.size} samples to {UDP_IP}:{UDP_PORT}, RMS: {rms:.2f}")
try:
with sd.InputStream(
device = DEVICE,
channels = 1,
samplerate = SAMPLERATE,
dtype = 'int16',
blocksize = CHUNK,
callback = callback,
latency = 'low'
):
while True:
sd.sleep(1000)
except Exception as e:
print(e)
You may need to adjust the DEVICE index and IP address based on your environment.
The following code runs on PYNQ-Z2, which create a UDP server to receive packets from the PC.
import socket
import numpy as np
from datetime import datetime
UDP_PORT = 5005
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind(("0.0.0.0", UDP_PORT))
print("Start audio streaming...")
while True:
data, addr = sock.recvfrom(4096)
pcm = np.frombuffer(data, dtype=np.int16)
rms = np.sqrt(np.mean(pcm.astype(np.float32)**2))
print(f"[{datetime.now().strftime('%H:%M:%S')}] Receive {pcm.size} data from {addr[0]}:{addr[1]}, Audio Strength(Mean):{rms}")
Integrating the SystemAfter completing the core tasks individually, we moved on to integrate the final system. This included band analysis, bar graph calculation, visual effects mapping, and some filtering. Piece by piece, the system came together — like solving a puzzle — and ultimately formed a piece of artwork.
At this stage, all components operate together seamlessly. The system captures live audio from the PC, performs frequency analysis, calculates the corresponding display pattern, and updates the LED panel in real time.
Running the Final ProgramPC Side
1. Open a terminal in the .\Program\PC\
folder.
2. Run the following command:
$ python main.py
PYNQ Side
1. Use MobaXterm to SSH into the PYNQ-Z2 as user xilinx.
2. Transfer the .\Program\PYNQ\
folder to the board using drag-and-drop in MobaXterm's SFTP panel.
3. Switch to the root user:
$ sudo su - root
4. Navigate to the folder where you placed the transferred files (e.g., /home/xilinx/Program/PYNQ/) and run the script:
$ python main.py
System PerformanceFFT Performance Comparison
The graph below shows the execution time comparison between software FFT and hardware FFT across various input sizes. After integrating the magnitude calculation stage into PL, the hardware FFT consistently outperforms the software FFT.
The original test result can be found under ./FFT Experiment Result folder
The performance gap becomes significantly at large FFT sizes. For example, at 65535 points, the hardware FFT reduces the execution time nearly 95%, demonstrating the efficiency of offloading heavy computation to FPGA.
Frequency Response Test
When performing FFT analysis, the data is multiplied by a Hann Window to suppress sidelobes, ensuring that frequency analysis remains accurate across the entire spectrum. The following video demonstrates the system’s accurate frequency tracking, showing the visualizer’s response sweeping smoothly from 20 Hz up to 20 kHz.
Visualizer Dynamic Response
In order to provide an eye-catching, beautiful, and smooth visual effect, we use some signal processing techniques to achieve the goal.
First, we analyze 960 samples of data padded with zeros, using a 4096-point FFT, which prevents the visuals from going out of sync with the audio. Second, we overlap newer data with older data to ensure precise frequency analysis. Last, a simple IIR filter helps minimize visual jitter on the LED panel.
The video below shows our system does have a great dynamic response to the energetic music.
References
Comments