Jump to content
  • 0

Vivado MIG corrupted reads in 2:1 mode


jb9631

Question

Hello, hope I'm not out of line asking this here. I've read through PG150 and UG586, I've used ILA, and yet I have problems with (title). Let me elaborate:

I'm using an Arty-S7 board with the 2 Gbit x16 DDR3 module. (The module on my sample is a PMF511816EBR-KADN, not a Micron MT41K128M16-125 unit as suggested in the Arty refman. If relevant -- as some timings differ -- I can attach both datasheets, as the former is difficult to find.) I've set up the MIG IP to interface with the memory, as so (versus Digilent's recommended part of MT41K128M16XX-15E):

image.thumb.png.dd2418164625b1a67d7c00425df277a6.pngvivado_P5i9yq8SZt.thumb.png.96ab478b616fa93203816ec4a0c8aa90.png

Of course, as stated in the MIG docs (PG150 and UG586), only burst length of 8 is supported, but I want to use nCK_PER_CLK=2, as that gives a higher ui_clk (this would be preferred for some other parts of my design).

I've set up my design so that every BL8 write (of 128 bits) is followed by a BL8 read. If the read and write data match, the address is increased and the next 128-bit packet is written. If not, an error bit is raised (also, the design loops indefinitely at this point as the read data is never equal to the data written).

I've outlined my frustrations already on Xilinx's own support forums, but those seem to be less active than Digilent's: https://support.xilinx.com/s/question/0D52E00006sD5uWSAS/understanding-mig-with-bl8-and-nckperclk-2 The posts here show how I execute read commands, and the last post shows the timing of the write commands. Evidently I must be doing at least something correctly, as most of the application runs "only" hit a problem at least as low as address 0x40. Here's an ILA snap of when things go sour:

image.thumb.png.c87046464f329ee05e3b0deb8c96028a.png

Notice I am attempting to write 0x2b35a8b2079e4f05f0d39e1e711e9d37 (of course divided into two 64-bit values, as seen in app_wdf_data), and the previous burst of writes was 0x55b89069096fe4f27a2a4afd930a93b5 (see the initial state of r128_ddr_rd_buffer, as seen on the very top, which matches with r128_2_rx_buff[1], as seen in third place from the bottom) but the read data is completely incomprehensible. At no point in the application run was "0x0c4c8592009551480da1d2d9bd68a5e1" (the data read from MIG) written, yet MIG calmly asserts app_rd_data_valid and app_rd_data_end one clock period later.

I would appreciate anyone with knowledge of MIG's workings chiming in, as I am currently out of ideas, other than trying to change nCK_PER_CLK to 4 and attempting 4:1 mode, but my gut tells me I might be doing something elementary very incorrectly. I can provide code or code snippets or pseudocode as requested, along with the aforementioned memory datasheets.

Thanks in advance!

--jb

Edited by jb9631
corrected RAM timings of PM module, added Micron module timings as comparison
Link to comment
Share on other sites

7 answers to this question

Recommended Posts

  • 0

I recommend that you stick with the memory parts listed in the reference manual for your board

Make sure that you are writing a sufficient amount of data to support whatever the DDR PHY data rate is. If the UI data width is half the number of bytes/s that the DDR PHY data rate is, then you will have to write 2 UI data words to the UI FIFO per write command. Each read command will provide half of the DDR burst data.

Make sure that you understand how to handle the situation when the UI rdy signal is de-asserted. Make sure that your device supports the clocking rates needed for your MIG choices.

I've posted a tutorial here: https://forum.digilentinc.com/topic/22197-a-guide-to-using-ddr-in-the-all-hdl-design-flow/ that might help.

A good place to start, is to implement the MIG example design on your hardware. There's a lot to be gleaned from the exercise.

Edited by zygot
Link to comment
Share on other sites

  • 0

Thanks a bunch for the reply. To start off, I am sticking with the memory part on my Arty board. I.e., the reference manual lists a Micron part, but the actual board came with a slightly different (undocumented by Digilent, even!) PieceMakers unit. Thus the timing values provided are slightly off, and the pinout of the address and DQ pins are also slightly different.

You are correct that reading/writing takes two ui_clk periods here. Since it is a x16 device, and I'm in 2:1 mode, the app_data vectors are 64 bits in length, but since MIG-7 only supports BL8 for DDR3, the data written must be 128 bits. I think I've gotten to understanding this, as shown in my ILA screencap above, and as described in UG586 (and, partly, only for 4:1, here: https://support.xilinx.com/s/article/62568?language=en_US).

I'll go ahead and read the linked tutorial, thanks for that! I've also looked at the example project generated by MIG, and have simulated it, but I didn't gather much from it. The traffic generator's output, to me, looks wholly incomprehensible, e.g. not changing address bits but instead writing two different values to the same address in memory. I know there has to be some logic behind it but I'm sure it could have been done more elegantly. The simulated waveform is so obfuscated with seemingly unconnected toggling of signals that I didn't even gather the bit about "two ui_clk periods per operation" until I had glanced over UG586 a few more times. That's my rant about the example design there, anyway.

Link to comment
Share on other sites

  • 0

The MIG example design is fully synthesizable and can be run on hardware as long as your platform has the resources to support the ILA and VIO components. The example design is, in my opinion, overly complicated and hard to follow. The tutorial remedies some of that.

The MIG IP, like many of the Xilinx IP, could use some cleanup and love. I've never found the Xilinx User Forums to be particularly helpful. Xilinx's position on its tools seems to be: Everything works as it should, and if it doesn't you can try tracking down all of official errata and design advisories ( that may or may not be germane to your tools version, and if you still can't figure it out then it's your fault, because everything that we provide is as it should be, and while we don't actually address all of the issues presented by customers with official design advisories that's not our problem. 

If you have any questions regarding following the tutorial, I'd appreciate you posting them there. I've made it clear why I don't think that it's possible to create a general tutorial to fit all platforms and designs and why the tutorial needs to be interactive.

 

 

Edited by zygot
Link to comment
Share on other sites

  • 0
18 hours ago, asmi said:

What? DDR3 pinout is a JEDEC standard, all memory modules have identical pinout.

My apologies, after looking again I can confirm only DQ pins are swapped around. Still, my point about this not being mentioned anywhere in Digilent's constraints, reference manuals, schematic etc. stands... Here's a snap of the relevant datasheet pages for the Micron module vs the PieceMakers one.

image.thumb.png.f37178744e0db7b79e1f91c50c402985.png

18 hours ago, zygot said:

If you have any questions regarding following the tutorial, I'd appreciate you posting them there. I've made it clear why I don't think that it's possible to create a general tutorial to fit all platforms and designs and why the tutorial needs to be interactive.

I've started reading the tutorial and already your suggestion to look at the FPGA part datasheet is invaluable. I never would have thought of looking there for what I thought was a generic thing -- In my mind the fact that a user-facing clock is one half or one quarter of the external DDR clock is unconnected to the frequency of the external DDR clock itself. (I'd imagined it as a clock divider or two, evidently that isn't solely the case.) As you rightfully note, however, the maximum throughput differs for 2:1 and 4:1 modes.

The conclusion: Because the Arty board has a 100 MHz oscillator, the 7 series FPGAs cannot do more than 620 Mbps in 2:1 mode, and the S50 part limits the DDR clock period to 3000-3300 ps, one configure MIG in 2:1 mode and expect it to work on this board. The 100 MHz oscillator pretty much necessitates setting the DDR period to 3077 ps, or 325 MHz. Though in hindsight it does surprise me that I was able to get the core to work as well as I did -- At one point (before realizing bursts ought to be 128 bits), I was doing 64-bit bursts (wiring app_wdf_end high), and I was able to access an entire gigabit of memory without a problem (i.e. exactly half of the memory module).

My original question is thus answered. In short, what I was attempting to do is not possible.

I already have a question about timing regarding part 1 of your tutorial. I'll post it in that thread and ask somebody to delete it if it's too obvious, hehe

Link to comment
Share on other sites

  • 0
4 hours ago, jb9631 said:

My apologies, after looking again I can confirm only DQ pins are swapped around. Still, my point about this not being mentioned anywhere in Digilent's constraints, reference manuals, schematic etc. stands... Here's a snap of the relevant datasheet pages for the Micron module vs the PieceMakers one.

image.thumb.png.f37178744e0db7b79e1f91c50c402985.png

Nothing is swapped around here - PMF part just uses different naming - Micron uses DQxx names for both first data lane (DQ0-DQ7) and the second one (DQ8-DQ15), while PMF use per-lane notation - DQL0-DQL7 for the first data lane, and DQU0-DQU7 for the second one. The actual pins ball locations are still the same. Which is why it should not matter if you use one or the other - FPGA pinout should remain the same.

Link to comment
Share on other sites

  • 0

Not to advance the discussion any further, but just to bring closure to my own guessings and ramblings, I'll jot down how I got my DDR3 working. If anyone finds themselves reading this months or years later, take this to heart: Read zygot's tutorial, at least the first part of it. There's enough wisdom there to nearly make this thing work. Second, read Xilinx's and Digilent's documentation.

Digilent's mig.prj IP template might not be accurate (or indeed, usable!), but it gives several hints that one might otherwise miss, as they are NOT stated in the reference manual. (The refman states, for most MIG config options, to leave them at default settings, but apparently the folks over at Digilent forget how Xilinx changes these defaults nearly with every release.) What saved me from nearly giving up was noticing the fact that Digilent's design uses 4 bank machines (default is 2) and normal ordering of commands (not sequential -- as UG586 states, these "out of order" operations are hidden to the user, and both reads and writes seem to occur in the order that commands are given). These two details, together, removed the remainder of my problems, i.e. once enabled, the IP core finally had time enough to process my commands between writes (which, in all honesty, didn't happen in such quick succession that I would think of speed as an issue, at about 60 clock periods between successive operations, but apparently -- not for the first time in this thread! -- I was mistaken).

In my case the example project is not runnable as Vivado 2019.2 reports some critical warnings and errors (naturally, how else) that I don't have much interest in solving. If my confusion about this IP as a newcomer isn't testament enough to the subpar quality of the example, then a rant from somebody more experienced, like zygot's, should be...

Thanks to both of you for the help and discussion. I'll go add my findings to the thread I opened over on Xilinx's forums, too, in case a fellow confused soul chances upon it in the future.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...