D@n

August 15, 2021

One of the things you may be missing about that thread is that the OP is a senior engineer at Xilinx.

Perhaps that might adjust your sense of how you read it. Personally, I view the conversation as a very valuable one, and I'm glad to see it taking place.

Dan

August 6, 2021

@kinshuks,

Out of curiosity, when you say it "gets stuck" ... how long are you waiting? 200us? Shorter? Longer?

Dan

December 24, 2020

@zygot,

1 hour ago, zygot said:

I sent Dan an email with a description answering that question. He's read the mail but so far has chosen not to make any sort of reply

Yes, I have read your message. I'm not sure I'm ready to respond to it, and so I have not. Let's just say I'm still mulling it over.

1 hour ago, zygot said:

I suppose I should start a new post thread discussing roadblocks to success that FPGA vendors throw in the way of customers who want to nudge the odds of success more towards themselves rather than the vendors

Go for it. Perhaps the discussion that follows (if any) will help me form an opinion one way or another.

Dan

December 24, 2020

@asmi

5 hours ago, asmi said:

I know Dan foamed a lot about how bad this auto-generated code is, but I can't remember a single case when I actually had problems with it (modified of course to suit my needs) in real designs.

I'm finding about one complaint in Xilinx's forums roughly every week to two weeks. The complaint is typically from a user whose design is locking for what appear to be completely inexplicable reasons. Digging further typically reveals that they are using one of Xilinx's demo designs--either the AXI-lite or the AXI (full) one.

Whether or not you run into one of these problems really depends on how you configure the interconnect. If you configure it for "area optimization", you won't be likely to see the bug. Another key criteria has to do with how the bus is driven. One user experienced lock ups when issuing an unaligned access from a MicroBlaze CPU. (Help! The FPGA engineer left the company, and I'm just the S/W guy but ...) Others have run into problems when trying to interact with one of these designs using a DMA of some sort. Another recent issue had to deal with connecting a 64-bit AXI bus to a 32-bit Xilinx generated demo peripheral. It seems a key to triggering either bug would therefore be two accesses in close succession--much as I outlined in my write-ups.

As you can see, whether or not the bug gets triggered is highly dependent upon the use case. Worse, when attempting to apply hypothesis testing to find where to look for the bugs (if I change A, the design fails, therefore the bug is in A somewhere), you'll often get sent looking for the bug in the wrong part of the design.

Dan

December 21, 2020

@zygot,

17 minutes ago, zygot said:

I'm always loathe to assert that something can't be done, seems like a guaranteed way to lose a bet.

Lol. You won't find me putting money down for such a bet either.

17 minutes ago, zygot said:

Writing an AXI master or slave in an HDL is one thing. Packaging and integrating that IP into form that can be used seamlessly by the FPGA vendors' HW/SW tools is another. So @D@n, what's the chance of seeing a compact demo of your IP that does that and can be replicated by user's of this forum using Vitis and Vivado? I'm absolutely sure that more than a few readers of the Digilent Forums would find it to be interesting and useful.

@zygot,

At the risk of taking this thread far off topic, let me ask, What sort of demo would you like to see?

I have an AXI performance measurement tool that needs to be tested out somewhere. It's a solution looking for a problem at this point--much like the demo you would like to see. So, again, what sort of demo would you like to see? What particular items are you interested in seeing? Things that would be relevantly useful in demonstrating? I make no promises to implementing such a demo in the near future (my contract time is currently overbooked), but I'd be glad to have the suggestion for when I get a spare moment later.

Dan

December 21, 2020

@lowena,

The "official" answer to how to move data from the PL to the PS is that you should build an AXI bus master in your PL design that can write directly to PS memory. A program running in PS can then check that memory for data written to it, and act accordingly. You'll need to beware of the data cache (turn it off--lest you read out of date information from within software), and the MMU (lest you write to the wrong memory address). Once you've dealt with those, writing an AXI master becomes quite doable.

Xilinx will also try to push you towards using a DMA to move data from PL to PS. Using a DMA is not a bad idea. Just beware of the bugs in their S2MM (stream to memory) DMA implementation--lots of individuals have gotten hung up on those. (Xilinx's official answer to their S2MM bugs is that they are misunderstood features--but that's a whole nuther discussion.) There are also several ugly/nasty bugs in their example AXI slave designs--so much so that I'd recommend not using them. Better alternatives exist.

Dan

July 1, 2020

@hamster,

Impressive!

Would you mind sharing the hardware you chose to use? The clock speed of the delta-sigma converter? It also looks like you biased your sine wave negative by a half a sample. Out of curiosity, was this on purpose or just the result of the way you truncated floats to integers when generating your table?

I'm a bit surprised by the second harmonic you show in your pictures. I had rather thought this technique would've done better than that. Do you have any suggestions as to what might have caused that harmonic?

Either way, good fun, and thanks for sharing!

Dan

June 3, 2020

@zygot,

So ... I got to thinking, a virtual FIFO sounds like a really easy core to design--especially with an application like this one in mind. Unlike many AXI cores, a virtual FIFO should be able to hold it's AXI burst length constant. It should also be able to maintain alignment with the memory it's working with--unlike other cores which have to worry about crossing that 4kB boundary when handling transfers with sizes determined at runtime. Even better, there's no end--so you don't have to check for transferring too much. That just really simplifies the AXI master by quite a bit. So ... I got distracted.

Here's what I came up with. You can see a generalized trace below--it's what you'd get from dropping the burst size from 256 down to 8, but at least it makes a decent picture.

Xilinx declares that their design depends upon their S2MM and MM2S cores. That seemed a bit heavy-weight to me. Those cores require an interface, a programmable data length, a length that might end up different from the programmed rate, a lot of TLAST processing and more. If you just want a FIFO, you can dump all of that junk and make things simple.

Thank you for pointing out the utility of something like this. It was a fun diversion to design.

Dan

May 25, 2020

@zygot,

Any special reason you aren't using a binary file? It's a whole lot easier to scroll through samples in a binary file than it is through a text file ... fseek works nicely in that case.

Dan

May 24, 2020

@zygot,

If you just want to move from an AXI interface to a simpler interface, you can convert it to either AXI-lite or WB without too much hassle. I just might have some bridges to handle that conversion lying around--bridges that will keep the entire bus running at 100% capacity. That would handle your criteria of "a simple bus with address, data, and a handful of simple gating controls". It's unfortunate that AXI-lite is a second class citizen in Xilinx-land. The AXI-lite protocol is quite capable, but many of the AXI peripherals that use it are not (cough, like the AXI BRAM controller that'd drop AXI-lite throughput to 25%). Thankfully, the MIG core doesn't seem to mind one way or another.

One of my own criteria when building my AXI data movers was that they should be able to handle 100% throughput even across burst boundaries. Judging from Xilinx's spec, Xilinx's cores don't do this, and so there is a throughput difference between the two implementations. A second difference is that I never limited the transfer to 256kB ... Of course, I don't have a virtual FIFO to offer. Never thought of building one. If I did have to hack something together in an afternoon, it'd be a WB FIFO that then used a WB to AXI conversion (while maintaining 100% throughput ...) Indeed, I did manage to build a fully verified WB stream to memory converter in a single morning, whereas the AXI equivalent took several days to get right. Yes, there's a cost for all this added complexity.

I think I might disagree with you about CPU design being one potential or even necessary user of such a complex bus structure. It doesn't need to be so. Indeed, IMHO AXI is waaayyy over designed--but that's another story. That said, I've been burned with cache coherency issues, so I can see a purpose for a protocol that would help the CPU maintain cache coherency. It's just that ... AXI isn't that.

Dan

May 24, 2020

@zygot,

This sounds like a fun and perhaps even a nicely well paid task. Nice.

Help me understand an overview here ... was that 4 ADCs of 100Msps coming in on each ADC? How many bits per ADC? (16'bits) How wide is the SDRAM you are working with? Stored into DDR3 SDRAM via a virtual FIFO, right? and then, you came off the board onto USB3 did you say?

Can you give me any indication of how close you came to the throughput limits of either the SDRAM memory or the USB3 offboard transport you used? Just trying to understand how much of a challenge it was to achieve your objectives here.

Were you using Xillinx's AXI crossbar, or was the virtual FIFO the only component that accessed memory?

Looking forward to hearing more of this fun project,

Dan

RCB · January 22, 2020

@RCB,

Why are you converting things to sign magnitude form, vs just leaving them in twos complement again?

I'm not certain what's going on. Were this my own project, I'd use an FFT I'd be able to "see" inside of so that I might debug the problem. Specifically, I'd look for overflow problems within the FFT--at least that's the only thing I can think of that might cause the bug you are referencing above. It doesn't make sense, though, that you'd have overflow with one FFT and not with an identical FFT that only differed in output ordering. You might wish to compare the .xml files of the two FFT's to see if they are truly the same as you believe. You might also wish to try dropping the amplitude by a factor of 4x or perhaps even 64x to see if that makes a difference. It might be that you have the scaling schedule messed up and that things are overflowing within. It might also be that you aren't looking at all of the output bits with your bit-cut selection above--I can't tell by just looking at it from here.

Dan

P.S. I don't work for Digilent, and do not get paid for answering forum posts.

January 17, 2020

@RCB,

Did you notice the glitch in your source signal in the second plot? It's in both data[] and frame_data. You'll want to chase down where that glitch is coming from.

After looking at that source signal, I noticed that the incoming frequency of your first image didn't match the 1MHz frequency you described. At 1MHz, you should have one wavelength inside of 1us. In your first plot, it appears that one wavelength fits in 20us, for a frequency of closer to 50kHz?

Further, I don't get your comment about holding config_tvalid = 1. If you have created an FFT that isn't configurable ... then why are you configuring it? It's been a while since I've read the book on the configuration --- did you hard code the scaling schedule into the FFT, or are you configuring that in real time? I can't tell from what you are showing. You also weren't clear about what config_tdata is. Was that the all zeros value you were sending?

Finally, the difference you are seeing between natural order and bit-reversed order is not explained by the simple difference between the two orderings. There's something else going on in your design.

Dan

Luke Abela · December 18, 2019

@Luke Abela,

I recently had the opportunity to write a data processing application that used an FPGA as an "accelerator". Sadly, it probably slowed down processing, but the infrastructure is something you are more than welcome to examine and work with if you would like. Data was sent to the FPGA using UDP packets over ethernet, read on the FPGA, assembled into larger packets for an FFT engine, processed, and then returned.

Dan

June 28, 2019

@Davie,

No, I don't think that will work. You can read many of my thoughts above. How about this, though: Why not build it, and try it, and then share with us the things you learned in the process? I'd be willing to look over anything you post, and see if I can offer any insights into things you get confused with along the way.

Dan

Ahmed Alfadhel · April 18, 2019

@Ahmed Alfadhel

To understand what's going on, check out table 8 of the datasheet on page 15. Basically, the DAC provides outputs between 0 and max, where 0 is mapped to zero and all ones is mapped to the max. In other words, you should be plotting your data as unsigned.

To convert from your current twos complement representation to an unsigned representation where zero or idle is in the middle of the range, rather than on the far end, just toggle the MSB.

Dan

April 3, 2019

@FR,

No, that's not quite right. Your data rate is not 100MHz, it is 61MHz. Therefore your bin separation should be 61MHz / 65536.

Dan

April 2, 2019

@FR,

Since you haven't provided me with enough information to really answer what's going on, here are some guesses:

You mentioned that your FFT and FIFO are both running at 100MHz. May I assume that this is your system clock rate?
Looking at your image above, it appears as though you have a much lower data rate than 100MHz. Can you tell me what your data rate is?
I notice that you are using a FIFO. Can you explain the purpose of this FIFO within your design? If the data rate going into the FFT is at 100MHz, then the FIFO really only makes sense if you have bursty data at a rate faster than 100MHz.
I have strong reason to believe that your tlast isn't quite right. Can you verify that if TLAST && !TVALID, then TLAST will be remain true on the next clock?
Indeed, is your TLAST generation done at the rate of your incoming data? Or is your counter independent of incoming data samples?
I understand you double checked your FIFO with MATLAB. You can read about my experiences with double checking my FIFO here, and the problems that remained unchecked.

These are just some ideas I had. They are by no means definitive. It is difficult to be definitive without more information about your design.

Dan

engi · May 7, 2017

@skandigraun,

I'm not a physicist, so others might correct me here, but has I understand things audio waves are compression waves. To "read" them, you need to create a diaphragm that will move as the compression wave moves, and then you can read the position of this diaphragm over time. The PMic does this with a MEMS microphone. Consider this to be the meaning of those twelve bits.

Be careful with that twelfth bit: it is a sign bit. You may need to extend it to the left some to understand it properly. For example, { int v; v = (sample<<20)>>12; }.

It is possible to get volume by simply averaging the absolute values of the various samples. While crude, the estimate should work.

Getting frequency is harder. Doing that requires a Fourier transform. However, sound is very often composed of many frequencies, as the attached picture shows. In that picture, time goes from left to right, frequency from bottom to top, and energy comes out of the page. It's taken from the opening of the Cathedral's recording of "Echoes from the Burning Bush." The clip starts with laughter, but otherwise has speech within it. I would particularly draw your attention to how speech has a fundamental frequency associated with it, followed by many harmonics of that same frequency--as shown in the picture. The result is that it can be difficult to say which frequency is in use, as many are present at the same time.

One of the book's I have on my shelf is Cohen's "Time Frequency Analysis." In it, Leon Cohen goes through and compares many algorithms for frequency evaluation. At one time I had a paper written that proved that the Short Time Fourier Transform, among his list but widely criticized, was the *only* frequency estimation problem that preserved certain key properties of spectral energy estimation: 1) all energy values should be non-negative, 2) all frequency shifts should produce frequency shifts in the estimation, 3) time shifts should produce time shifts in the estimate, and 4) that the estimate have and achieve the "best" time-frequency resolution as measured by the uncertainty function. Perhaps I'll find a venue for publishing it in the future. For now, you might wish to study the discrete time Short Time Fourier Transform, which is appropriate for the data coming out of the PMic.

At one time, I tried to build a digital tuner from sampled data. Such a tuner requires exactly what you are asking for: knowing the frequency of the incoming data. Further, it requires the assumption that there is only one incoming frequency, even when multiple are present (as the diagram shows). To get there, I evaluated the autocorrelation signal that I got by taking the Inverse Fourier Transform of the magnitude squared of the output of a Fourier transform, and looking for the biggest peak. This operation, taking place in time, usually but not always found the fundamental frequency I was looking for.

One more thought: you can find forward and inverse Fourier transform code, in Verilog, here, just in case you need it.

Hope that helps,

Dan

May 4, 2017

@Yannick,

FPGA's can't represent fractions. Looking at your pictures above, you have 8-bit values coming out of your DDS. Hence, the range of these values should be (at most) between -128 and 127. According to the "unit circle" description above, 8-bit numbers are clipped to being between -64 and +64. (There really isn't any +/- 0.5 within an FPGA, but one might think of these values are representing +/- 0.5, since they are nearly half of their full range.)

Multiplying two such values together should give you something in the range of -4096 and 4096 (you might think of this as -0.25 to 0.25). Although this could fit into 14 bits, you've got it in 16. Not a problem, just unused capacity. Moving on ...

If your coefficients are 16-bits, then they should have values between +/- 32767 (ignoring -32768 for now). Multiplying your 16 bit value with a 16-bit coefficient nominally gives you a 32-bit value. (You are only using 14 bits, so you could spare a bit or two here if necessary ...) If you have 16 such coefficients, log_2(16)=4, so adding the results of these multiplies together might give you an additional 4 bits, bringing you to 36 bits. If you instead had 256 such coefficients, log_2(256)=8, so adding the result of the multiplies together would give you an additional 8 bits instead, bringing you to 40 bits.

At this point, you are getting some really HUGE numbers. You and I both know that your signal isn't that big. How do you get back to what your signal was? To do that, you have to track the bit math and the multiplies. If you decide that you started with +/- 0.5 numbers, scaled by 2^7, then your next step left you with +/- 0.25 numbers scaled by 2^14 ... and so on. The reality, though, is you really don't have that many bits. You really only have about 8-bits of information, packed tightly into 40 bits. (Or ... not so tightly ) At this point, you need to drop some bits. Well, actually, you should've dropped the bits agressively as you went along--but that's more of a logic is precious comment, rather than a how it must be done comment. You can figure out how many bits to drop by tracking the maximum value and the standard deviation of any noise working its way through your system.

How do you go about shedding such bits? My first approach to doing so was to just drop the low order bits. While doable, this will introduce a DC bias into your result. (I had to dig into this when building my own FFT ...) The solution I found was convergent rounding ... but I'll let you look that one up.

Dan

P.S. ... I hadn't noticed that English wasn't your native language

May 4, 2017

@Yannick,

Not quite sure what problem you are having as the plots look good from here. (I can't read the scale, though, on those images ...)

In chart one, you create two 100 kHz signals and multiply them together. That will create a signal near DC, and a signal at 200 kHz. 200kHz is significantly above your filter cutoff, so ... it's gone. That leaves you with the signal near DC. (I assume you are sampling in the MHz range still ...) The fact that the signal near DC is not constant could be just a transient effect of your filter ... from here and with no more details I can't tell.
In the second chart, the two 20kHz signals multiplied together create a signal at 40 kHz and one near zero again. (If this doesn't make sense, work the double angle trig formulas and it should) The 40kHz signal component is quite obvious on the chart. Since 40kHz is below your cutoff, it passes right through without a problem. You can also see an initial startup transient, much like I would expect.

I would also expect the startup transients for your filter to be the same length--both for the DC transient as well as for the 40 kHz transient. I can't tell from your charts if there's a difference between the two transients -- since the two charts are on different time-scales.

Going back to your explanation above, I'm not sure it makes sense. Ignoring the fixed point issues, a DDS should produce a sinewave between -1 and 1, not -1/2 and 1/2. Second, multiplying two sinwaves together should produce a value that is also between -1 and 1. Now if you add the fixed point issues back in, you'll need to multiply all of your numbers by 2^(N-1)-1 so that they will fit in an N bit number. Hence, your DDS output should be between -2^(N-1)+1 and 2^(N-1)-1. If you assume all your inputs have N bits, then multiplying your two values should then give you a result that fits in 2N bits. (It won't quite use up the whole range ... [2^(N-1)-1]^2 is 2^(2N-2)-2^(N-1)+1 ...) If you then run this though a filter having coefficients of N bits, your result will increase from 2N bits to 2N plus the number of bits in your filter taps, so in this example you'd end up with 3N bits for the multiply alone (neglecting the additional base two logarithm of your filter length for the accumulator portion of the FIR). If your coefficients have 2N bits each, your result goes from 2N to 4N bits, etc. I'm not sure where in this sequence you would get either a [-1/2,1/2] range or a [0,1] range.

Dan

May 3, 2017

@Yannick,

Looks good to me!

When doing digital signal processing, you really want to plan to use the highest gain that your processing will support. Using integer math, a lowpass filter gain of 0dB means you have an allpass filter--not what you designed. In order to maintain your performance, the coefficients had to be turned into integers. What you should be looking for at this point is that the stop band remains as you would like it. Since it remains at about 40dB, I'd say it's about as good as the original. (You sure you only want 40dB? I was always taught 70dB as a rule of thumb ...)

To know how much this filter will "amplify" your incoming signal, just add all the coefficients together. (Works for lowpass filters ...)

As for how to make certain you aren't "amplifying" your signal, you sort of need to define what truth is in order to compare against it. Is it 12-bit resolution you want? Then after a filter, you may need to drop the lower bits to get back to 12. However, the devil is in the details in order to make certain that you maintain your 12-bit range throughout your processing chain. To handle things properly, you'll want to make certain that the constant 12'h800 and 12'h7ff signals pass through your processing chain (filter plus whatever else you will be doing to them) and turning into 12'h800 and 12'h7ff signals at the far end --- without overflowing any of the math in the middle.

Dan

New Users Introduction · April 13, 2017

@mikeo2600,

Welcome to the forum! I have an Arty as well, and love it. I'm very much an open source type, so you won't find me using any of the AXI peripherals and I tend to use the ZipCPU instead of MicroBlaze. Still, you can find what I've done with it on GitHub if you'd like.

If you need any help getting your own IP up and running, I'd be glad to help out.

Dan

New Users Introduction · March 23, 2017

@MrKing,

Welcome to the forum. Feel free to share your questions, as there tends to be a lot of learning going on here!

Dan

March 8, 2017

We're getting closer. Give me another holler on #digilent-fpga when you're ready to tackle this again,

Dan

Sign In

D@n

Posts

Joined

Last visited

Content Type

Profiles

Forums

Events

Gallery

Posts posted by D@n

Interesting Reddit thread

Microblaze simulation getting stuck with external ram attached in Vivado

How to exchange data between PL and PS?

How to exchange data between PL and PS?

How to exchange data between PL and PS?

How to exchange data between PL and PS?

Second Order Sigma Delta DACs implemented in a FPGA.

Capture 4 channels of 120+ million ADC samples

Capture 4 channels of 120+ million ADC samples

Capture 4 channels of 120+ million ADC samples

Capture 4 channels of 120+ million ADC samples

FFT output result using Xilinx FFT core (v9.0)

FFT output result using Xilinx FFT core (v9.0)

Ethernet Communications in an FPGA

How to interpret pmod mic3 data?

Visualizing 5 kHz sine wave by Pmod DA3

FFT output result using Xilinx FFT core (v9.0)

FFT output result using Xilinx FFT core (v9.0)

How to interpret pmod mic3 data?

Magnitude issue when designing low-pass filter with FIR Compiler

Magnitude issue when designing low-pass filter with FIR Compiler

Magnitude issue when designing low-pass filter with FIR Compiler

Welcome!

Welcome!

ZYBO - downloaded program data is wrong

Browse

Activity