Jump to content

Is the RPi5 suitable for paring with an FPGA?

Recommended Posts

Is the RPi5 suitable for paring with an FPGA?

You may have noticed that the latest generation of RPi5 uses an asic, referred to
as a Southbridge, for IO. All of the IO, Ethernet, USB connectivity etc. are implemented
in the RP1 Southbridge whic is connected to the BCM2712 processor though a 4-lane PCIe
Gen2 interface. This makes it substantially different from the RPi4 or RPi3
boards. So you might be wondering if the new version can be connected to an FPGA. That
is a question that I decided to investigate for myself.

The RPi5 has a single lane PCIe Gen2 ( perhaps even a Gen 3 ) header on it that someone
will eventually create an FPGA add-on board for. Most people will want to use that interface
as a higher performance alternative to the SD card. But the RPi3 and RPi4 could always
connect to an FPGA board using a USB 2.0 bridge device like the FT2232H supporting synchronous
245 mode operation. I've done this and performance for the RPi3 and RPi4 is OK for moving
small blocks of data between the processor and the FPGA. As the amount of data being transported
increases the performance drops off considerably, unlike an x86_64 processor.

For my experiments I'm using a Genesy2 board with a FMC_UMFT601BX mezzanine board. This
allows the HDL application in the FPGA to act as a USB 3.0 endpoint with a peak data rate
of 400 MiB/s. The design of the Genesys2 application used for my tests is pretty straight-forward.
All data uploaded to the FPGA from the USB Host gets stored in DDR3 that functions as a very
deep FIFO. The USB Host can retrieve the upload data once it's been downloaded. The HDL expects
n sectors (4096 bytes/sector), to be uploaded and then downloaded. The DDR3 interface in combination
with sufficient FIFO storage can accept up to 1 GB of upload data and return it without delays.
In addition to the simple up/down data scheme the HDL has performance timers to timestamp the
important events in the tests. These are: the time that the first 32-bit word is uploaded, the
time that the last 32-bit word is uploaded; and the same events for download. Since it's a
free-running counter, I can calculate the total time that has elapsed between reading the
first 32-bit upload word from the FT601 FIFO to the last 32-bit word written to the FT601 FIFO.
This provide a much more accurate picture of the USB Host OS/Software behavior and performance
than typical software timing methods provide. It must be noted that from the perspective of the
USB Host data rate performance is more complex than just time spent in the driver filling or
emptying the FT601 FIFOs. I have my own Software application that is mostly identical for Windows
and Linux platforms. The D3XX drivers for these platforms are not the same however.

The FT601 is not the only way to connect an FPGA to the RPi5 via USB 3.0. I also tested the
XEM7320 with the Infineon FX3 bridge.

For the FT601 Test I used this setup:
- Genesys2 FPGA board
- RPi5 8 GB w/ heatsink/fan
- Raspios Bookworm 64-bit
- libftd3xx-linux-arm-v8-1.0.5
- FT601_245.cpp Host Application
- G2_FT601_TESTER.vhd

In FT601_245.cpp I do software elapsed time calculation. The test runs in this manner:
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
ftStatus = FT_WritePipe(ftHandle, 0x02, pBufOut, SectorSize*up_sectors, &BytesWritten, NULL);
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
ftStatus = FT_ReadPipe(ftHandle, 0x82, pBufIn,RxBytes,&BytesReceived, NULL);
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &stop);

So, there is some processing between the upload and download calls to the D3XX driver.

Without further ado, here is a sampling of results for the Genesys2/FT601/RPi5 testing

Test Upload Average Download Average
Length Time Upload Time Download
Bytes Data Rate Data Rate
------- ------------ --------- ----------- ---------
16384 42.09 us 389 MiB/s 44.68 us 367 MiB/s
65536 170.66 us 384 MiB/s 196.35 us 334 MiB/s
262144 682.51 us 384 MiB/s 848.43 us 309 MiB/s
1048576 2.73021 ms 384 MiB/s 3.37639 ms 311 MiB/s
4194304 10.92282 ms 384 MiB/s 12.13967 ms 346 MiB/s
8388608 21.84317 ms 384 MiB/s 24.26676 ms 346 MiB/s

This is rather surprising that the upload data rate is consistently around 380+ MiB/s
for blocks of data ranging from 16 KB to 8 MB. Download rates were less consistent from
run to run but still near 350 MiB/s. For USB 3.0 the RP1 Southbridge performance is
outstanding. I believe that the low rates in 310 MiB/s range were outliers, but within
the range that one should expect. BTW the performance with the same setup, but using my
Ubuntu 22.04 i7-13700K box was dismal; about 44 MiB/s up and down for 1 MB test.

I also tested the RPi5 using this setup:
- XEM7320
- RPi5 8 GB w/ heatsink/fan
- Raspios Bookworm 32-bit
- FrontPanel-Raspbian10-armv7l-5.3.0
- EthAppliance.cpp Host Application
- EthAppliance_1.vhd
- Genesys2 configured with Genesys2_Eth_DUT.vhd ( Ethernet echo application )

A few months ago Opal Kelly had posted a 32-bit beta ARM driver for FrontPanel; it's
since disappeared from their download website.

I didn't try to do do a performance test with this setup. All I wanted to know was
if I could run the application on the RPi5 and see if it worked as well as on my
x86_64 Windows and Linux platforms. The application streams TX Ethernet 1 GbE packets
through a SYZYGY Ethernet pod. It stores RX packets simultaneously into a DDR3 buffer
that can be read later. I was able to run the HDL and software applications with
performance that was equal to that on x86_64 Win10 and Ubuntu 22.04 platform; that is
sustained 120+ MiB/s full-duplex Ethernet.

So, do I think that an RPi5-USB 3.0 FPGA could be interesting? Absolutely I do! The RPi5
is an impressive bit of gear with some interesting possibilities.
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Create New...