Jump to content

Hello!


oliviersohn

Recommended Posts

Hello!

I'm a FPGA newbie, I just purchased an Arty A7 from Amazon (the 35T version and received the 100T version - lucky me !!) and will be trying to run an audio convolution reverb on it... Well, at least that's the goal, first I'll try to power it on I guess :)

I had a question regarding the sticker that's on top of the FPGA chip: I assume I should remove it, in case it gets hot, but didn't see any instruction to do that anywhere, is there a documentation I've missed?

Cheers,

Olivier

 

Link to comment
Share on other sites

Hi,

I'm not aware of any such instructions. It's a valid concern if you fully use the MAC capabilities of a 100T but burning an FPGA would be fairly low on my list of things to worry about.
If in doubt, you can enable automatic thermal shutdown by instantiating an XADC (see page 68 of https://www.xilinx.com/support/documentation/user_guides/ug480_7Series_XADC.pdf).

A convolution reverb is a pretty good fit to the chip, not necessarily commercially viable but easy to implement. You'll have no difficulty to mux e.g 1000 audio sample MACs at 96 kHz on a single hardware multiplier (of which there are 160, but the block RAMS will most likely be the bottleneck). 25 bit data path and 18 bit coefficients seems the logical choice, especially since RAM comes in multiples of 9 bits, not 8 (the "parity bit").

My guess is that the supporting infrastructure (e.g. how to upload coefficients into RAM unless you want one audio sample "hard-coded", interfacing with the codec) is much more work than a brute-force convolution algorithm. A possibility is to assign one side of the dual port RAM to the convolution algorithm, the other side to configuration.

Link to comment
Share on other sites

...speaking of which: a minimum-phase transform is easily done in e.g. Octave, simply roots(), and mirror selected zeros to the other side of the unity circle (mag := 1/mag)

Just mentioning this because it may be a necessary pre-processing step to get the job done without a PC size power supply (=identical magnitude response with the shortest possible FIR length). Whether or not it sounds the same, depends (e.g. a cabinet - yes. A cathedral - probably not).

Link to comment
Share on other sites

Hi @xc6lx45, thanks for your replies!

My concern was about burning the sticker due to FPGA heat, not burning the FPGA itself. But thanks for the link! I finally just removed the sticker.

I asked the question on stackoverflow regarding brute-force convolution a few days ago (just before buying the board actually): https://electronics.stackexchange.com/questions/406295/brute-force-convolution-reverb-in-fpga and someone answered with a back of the envelope calculation, where external RAM is used instead of block RAM, but in the calculation the speed of the RAM bus was not taken into account. I see that on the Arty A7 the 16-bit bus is @ 667MHz, which is roughly the same speed as the FPGA clock, but at each FPGA clock I would need to read some hundred coefficients from RAM so it's not going to work, as you said the RAM is the bottleneck here ? 

Anyway, it'll be fun to see how far I can push this!

I didn't understand what you meant by the need for a preprocessing step, do you think of truncating the response?

Link to comment
Share on other sites

@oliviersohn,

Let me start out by disappointing you: The Arty's memory chips will run faster than the interface will, so follow the interface speed.  Xilinx's MIG will then limit your design speed to about 82MHz or so.  In each 12ns clock, the memory controller will allow you to read 16*8=128 bits.  That's the good throughput number.  The bad number is that it will take about 20 clocks from request to response.  Yes, I was disappointed by the SDRAM when I first got it working.

My OpenArty project includes instructions for how to set up the SDRAM interface if you want to use it from logic (i.e. w/o the microblaze).  (I'm still working on fixing the flash controller since Digilent swapped flash chips on the newer Arty's ... but at this point the needed change works in simulation, needs to be tested on actual hardware, and then re-integrated with the SDRAM ... but that'll be working again soon.)

You may find this blog post discussing how to perform a convolution on "slow" data (like audio) valuable to your needs.

Dan

Link to comment
Share on other sites

@D@n I guess the memory controller needs to read 8 times from the memory bus before delivering the content (hence the 20 cycles from request to response), and maybe I could do some pre-fetching stuff to hide this latency ...

If the MIG runs at 82Mhz (which makes sense since it is roughly the memory bus frequency divided by 8 ), is it possible to have a slower clock for the MIG and a faster one for the rest of the design?

Link to comment
Share on other sites

@oliviersohn,

Is it possible to have a faster clock for the design?  Yes.  However, it can be so painful to do in practice that you won't likely do so.

  1. You'll need special circuitry within your design every time you cross from one clock "domain" into another.  Singular bits can cross clock domains.  Multiword data requires an asynchronous FIFO.
  2. This circuitry costs time (two clocks from the new domain) to do.
  3. Hence you'll lose two slow clocks going from your faster clock speed to the slower one, and two fast clocks going in the other direction
  4. There be dragons here.  It's doable, don't get me wrong, but ... there are some very incomprehensible bugs along the way.
  5. What speed are you hoping to run at?  When I first picked up FPGA's, I was surprised to discover the "posted" speed from the vendor had little to no relationship with the speeds I could actually accomplish.  For example, despite the 500MHz+ vendor comment, a 200MHz design is really pushing things.  100MHz tends to be "comfortable".  However, you may find that the difference between 100MHz and 82 MHz may not be all that sizable.

Dan

Link to comment
Share on other sites

@D@n I don't have an exact expectation of speed, since I'm merely doing that to learn and to get a sense of what developing for an FPGA looks like. Your answer kind of helps in with that respect!

Doing the brute force convolution is an intermediate goal, but the end-goal of this first project is to do fft-based convolution, since it uses less operations (I implemented a 0-latency convolution using ffts on a CPU, in C++, now I'm trying to see what it takes to do it in hardware).

I guess I'll start with very simple things first like making leds blink :)

Olivier

Link to comment
Share on other sites

@oliviersohn,

If you want to start simple, you might wish to try the tutorial I've been working on.  I have several lessons yet to write, but you may find the first five valuable.  They go over what it takes to make blinky, to make an LED "walk" back and forth, and then what it takes to get the LED to walk back and forth on request.  The final lesson (currently) is a serial port lesson.  My thought it to discuss how to get information from the FPGA in a next lesson.

Dan

Link to comment
Share on other sites

2 hours ago, oliviersohn said:

Is it possible to have a slower clock for the MIG and a faster one for the rest of the design

For the external memory controller using the hard IP in the Spartan 6 the recommendation is to have the DDR logic communicating with it to be at least 1/2 the DDR interface clock rate. This might be different for soft external memory controller implementations. Your other logic can run at any clock rate you want as long as you pay attention to clock domain crossing issues properly. This usually means dual clock FIFOs for the data paths. Latency in high speed data is always an issue which is why cache memory becomes important. Part of the design effort, and fun, is figuring out a system design that makes implementation reasonably simple but supports the needs of the overall project goals. I have a pretty old text from a guy named Dykstra titled ( something like ) "Data Structures Plus Algorithms Equals Software" The title routinely pops up in my mind when starting a new project. MIG IP projects are a real pain to deal with so I don't tend to create too many but prefer to reuse one generally useful but high enough performance IP when possible. My last Spartan 6 based project reuses an old MIG design that runs the DDR at 333 MHz though in theory 400 MHz is possible. My controller logic that connects to the hard controller runs a 100 MHz ( not 167.67 MHz  and I get a paltry 148 MB/s or so transfer rate out of it. Latency is dealt with on a system design level. This is SOP. You might think that a MIG interface that earns a 'slacker' rep would be undesirable. What I get is much lower thermal byproduct to deal with and this can be an important consideration for the overall project objectives. If I need 800 MB/s then I need to do another MIG IP project.... but you'd be surprised at how useful a 'low performance' interface can be for most projects. Once in a while I have a board with a wide (32-bit or 72 bit) DDR external interface and it's worth the time to squeeze as much performance out of it as I think that I can get.

BTW that Spartan 6 DDR controller has 4 32-bit read/write ports ( the DDR is a 16-bit device ) and I've used them all concurrently without issues.

Link to comment
Share on other sites

1 hour ago, oliviersohn said:

I implemented a 0-latency convolution using ffts on a CPU, in C++, now I'm trying to see what it takes to do it in hardware

My initial reaction is that this seems like kind of a big first step. I certainly like the idea of protoyping a concept in software. I've done FFTs in ECL logic and have an idea what is involved so I'm guessing that you will be using IP. IP and high level MATLAB or C++ function calls are quick and great for prototyping concepts. For flexibility and understanding all of the nuances of what's involved they are not so good. Feel free to ignore this advice but I'd start will a simpler construct like a filter or PID. You can prototype this in MATLAB OCTAVE or SCILAB using nothing more than simple If;then:else and logic keywords to create an algorithm that is a lot more understandable to implement in logic and you don't have to rely on and accept the limitation of third part IP.

If blinking an LED sounds like a good beginner's first step then I'd heartly recommend that you become adept at doing HDL simulation which involves the art of writing useful test benches. Looking at good quality HDL code is a good way to get 'up to speed' quickly with the FPGA development processes... or a good way to learn bad habits...

Link to comment
Share on other sites

9 hours ago, D@n said:

Xilinx's MIG will then limit your design speed to about 82MHz or so.  In each 12ns clock, the memory controller will allow you to read 16*8=128 bits. 

@Dan, Gee those numbers sound low for a DDR3L device. I don't own one of these boards so I have no experience with the Xilinx MIG tool for this device combination to argue but are you sure? The Spartan 6 hard controllers can do a variety of burst lengths and port widths. Are you using a 128-bit width? I didn't run into any external performance claims from Digilent for the board but then again I haven't tried too hard either. For reference my 'lazy' DDR2 ATLYS interface does 32 32-bit burst operations through the 16-bit 333 MHz DDR physical interface. Really, I'm curious.

This does raise a point about a dilemma confronting people with some degree of technical proficiency, though none in FPGA development, wanting to try out putting ideas into hardware and not knowing what platform to select. In a corporate design scenario you start out with specifications and design your hardware to suit but if you are buy off the shelf hardware, without FPGA experience, and a small budget getting started is tricky. 

Link to comment
Share on other sites

11 hours ago, oliviersohn said:

I didn't understand what you meant by the need for a preprocessing step, do you think of truncating the response?

I just post a link, here:https://ccrma.stanford.edu/~jos/filters/Minimum_Phase_Polynomials.html

it's not necessary but a common mathematical shortcut for cabinet modeling-like applications: it calculates a shorter impulse response with an identical frequency response.

It doesn't preserve time dispersion, so it doesn't work for reverb-type applications.

Link to comment
Share on other sites

@oliviersohn, @D@n

Since you lucked out and got an XCA7100T-1 based board you may be able to do better than Dan's experience with the XCA735T-1L board. I was able create a MIG project based off the Digilent GIT project and ddr ucf files. It should allow clocking the controller logic at 150 Mhz and get 600 Mbps data rates (2:1 clock ratio). I can't verify this as I don't have your board. I was discouraged to see that the burst length can only be 8 and the controller won't expand the native data width. Most discouraging is to find that after 10 years the MIG tool is as unpleasant to work with as ever... 1 step forward, 2 steps back, 2 steps forward, 4 steps back...rinse (scream) and repeat.

I don't know if the MIG project file supplied by Digilent was created in ISE but I couldn't import it as suggested.. at least I could read the xml in a text editor.

Link to comment
Share on other sites

  • 2 weeks later...

Hi @enrik,

Welcome to the forums! You would not be able to use the USB UART bridge using Verilog/VHDL. This is because the USB UART bridge is wired directly to the ZYNQ Processor(PS). If you are trying to use the PS UART not through the USB UART Bridge then you can use the EMIO or MIO pins depending on what you are trying to do. Here is a xilinx forum thread that discusses this.

thank you,

Jon

ZYNQ.jpg

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...