Jump to content

jb9631

Members
  • Posts

    18
  • Joined

  • Last visited

Recent Profile Visitors

204 profile views

jb9631's Achievements

Member

Member (2/4)

5

Reputation

  1. Hey all, quick update on this interface's progress. tl;dr: I've made a change to the PHY component of this interface, and, in my testing at least, it is now possible to communicate with the DDR3 chip at much higher frequencies than MIG allows. My Spartan chip "only" allowed me to test up to 928 MT/s, where communication was still robust. After some thinking, tinkering, and testing, I've come up with a solution to the clocking instability at higher frequencies. I still don't have proper timing constraints, but the problem before seemed to be an obvious delay between the address and command lines that were routed from the logic to an output buffer directly, and the data bus (DQ, DQS, DM), which was processed through OSERDES. The trick I've employed now, at a slight cost to FPGA area usage and write/read latency for the end user, is to route all of the mentioned signals (cmd/addr/data) through OSERDES. As far as I'm aware, the IO blocks manage data transfer from the slow logic clock domain to the fast memory clock domain themselves, and the external lines, clocked by the fast clock, are inherently in sync. Relying on this seems to work well: I've done some testing by raising frequencies, but I hit a stop at the maximum BUFG frequency of my Spartan 7, 464 MHz, at which point my controller still works. For reference, Xilinx's MIG only works up to 333 MHz. How's 40% for a sequential performance increase? :) Write leveling is still impossible, so using more than one SDRAM chip is out of the question, The DQ lines in a multi-chip arrangement go to each chip individually, but the command bus employs a fly-by topology, introducing skew. I'm not too worried about this, as the MIG IP on my Spartan doesn't use write leveling, either. That's all from my end, at least for the time being. Anyone reading this is of course welcome to contribute, ask, test etc. If you happen to have a faster 7 series FPGA connected to a DDR3 chip, and are willing to do some testing at higher frequencies (1066 or even 1333 MT/s), please do let me know. Edited to add: While the interface seems to work at 928 MT/s, this means that the bus frequency is 464 MHz, and, because the interface has a 2:1 PHY to logic clock ratio, this means the internal logic runs at 232 MHz. Designing (top) modules that run at this frequency on a -1 speed grade Spartan 7 is not trivial. For example, a 128 bit comparator will fail timing at this frequency unless pipelined. There are 3 solutions to this that I can think of, listed here in what I deem to be increasing difficulty/complexity: (1) Simply pipeline your design where it meets with the interface's high frequency. (2) Employ one of many CDC techniques, such as an asynchronous FIFO, to enable your modules to run at different frequencies than the interface. (3) Kindly ask Xilinx to provide us the documentation for the SERDES primitives in the "MEMORY-DDR3" mode, as currently this mode is not supported, and is the only one that will result in a 4:1 PHY to internal clock ratio. I've seen that there is a company called Xylon that sells a DDR3 interface for 7 Series devices that operates with a 4:1 clock ratio, but the license for that is 3500 € yearly, the delivered product is just encrypted VHDL, there are no reviews I could find of it, and it is still twice as large as my own work. I also remember a user on here asking about running MIG on an Arty S7-25 board and having had difficulties due to the IP core's FPGA area usage. I hope that it isn't against some unwritten rule of forum bon ton to tag @Mathias despite their inactivity.
  2. Thank you for the kind words. Sadly while I do have intimate knowledge of the project, I cannot in good faith recommend it be used without further work w.r.t. timing. Still, I think it is one of few examples of open experimentation with SERDES and IDELAY blocks available online. The documentation woes you mention, I've come to known all too well in my time making this. I've alluded to this once before, but since it's relevant to this thread specifically, I'll go ahead and cite the worst offender: I found out the hard way (read: after literal weeks of experimentation) that the ISERDESE2 OCLKB pin cannot be driven by a MMCM/PLL. You'd think MMCM outputs with controlled 180° phase shift would be better than a local inversion but apparently I'm a fool for believing this. Naturally, this isn't mentioned anywhere in Xilinx's docs, and you're left to find out about it yourself, if you happen to figure it out at all! To add insult to injury, UG471 (the document that should state this) says that the OCLKB pin "is shared with the OSERDESE2 CLKB pin." The joke here is that OSERDESE2 does not have a CLKB pin. I'd say your experience with both FPGA vendors goes to show that the grass on the other side isn't greener. Perhaps both are a brown-ish hue. One of the big takeaways from putting together a low-level project like this is that, though I may buy an FPGA chip, and own it physically, what I may do with it is purposefully limited by the manufacturer with their tools and their documentation. Pleas for Xilinx to document the PHASER primitives are public, and often have replies from Xilinx employees, yet Xilinx refuses to do so. Instead, documents like UG953 show page upon page of this same text, verbatim: No instantiation, no inference, no use, and no modification. Only to be used by Xilinx. The only explanation I can think of is a sort of stubbornness to force users into using their IP. Because without primitive documentation, building functioning IP cores becomes impossible. The only way to run a DDR3 SDRAM chip with a 7-series FPGA (at decent clock frequencies, anyway) is through MIG. Perhaps I'm being too negative, but I for one am an advocate for openness. To finish this post on a positive note, I will say that regardless of the hurdles encountered (and some, justifiably given up on), there is a lot to be learned from such an experience. The end result might be far from perfect, but the knowledge gained is invaluable, as it covers topics from transmission lines and data integrity and data eyes, to the workings of high speed interfaces and some "gotchas" within high-speed FPGA designs. Not all of this knowledge is directly applicable (I cannot, for example, stick an oscilloscope probe onto one of the SDRAM chip's pins), but it's a building block that might come in use in other contexts. In a sense, I am grateful that whoever put me up to this challenge had little clue what exactly the challenge was, having said at one point that they could complete this project within two weeks (implying triviality), but choosing to delegate the task to me instead.
  3. Hi again and thank you! In hindsight, it's not much, and much of the logic could be done much better. In my defense, I'd only just started learning Verilog when I started this project. :) The core isn't vey complex, and nearly half of it are just primitive declarations. The PHY mostly--but not completely--follows Xilinx's own design (for DDR2 on Virtex-4), described in better detail than I could manage, in XAPP721: https://docs.xilinx.com/v/u/en-US/xapp721 A related memory core that I've linked to before on this forum, that's focused solely on DLL disabled functionality (<125 MHz) could also give some insights into putting together a PHY: https://github.com/ultraembedded/core_ddr3_controller (though this design ignores both write and read calibration). Most of my time was taken up by having to sift through endless Xilinx's FPGA and generic DDR3 and application notes and piecing together a system that would produce some results. Zygot's guide was invaluable to me in getting Xilinx's MIG to behave, but I'll admit I didn't read through its entirety, so I can't comment on how usable his work might be in actually designing a memory core (versus using one). Fair warning, though: Two of the major issues of my design are (1) that write leveling is impossible to implement, leading to bad data being written at higher frequencies, and (2) my lack of knowledge with regards to timing constraints, since the address and command bus, and the data line, and data strobe line, are technically crossing clock domains outside of the FPGA. I'm sure you can imagine the implications of ignoring time constraints (mainly no help from Vivado's timing analysis). I discuss the problem of write leveling in more detail in the project's README. I also have a very simple testbench made that helped me visualize when and how the signals toggle. (Many signals aren't clocked in the same domain, so ILA is of limited use.) If anyone would like access to that, it's all currently very unhygienic code, but available in the core's sister repository, linked under the readme's "Example project" header. Good luck to you, your coworker, and any other soul brave enough to sift through this creation. If there are uncertainties, I am eager to offer clarifications. --jb
  4. Hi, This is my attempt at building an alternative controller to Xilinx's MIG. Its FPGA utilization is much lower than MIG's, and it enables support for DDR3's optional "DLL disabled" mode, for frequencies <125 MHz. It also enables read calibration, which, according to my research, is rare for a custom/open-source memory core. I haven't checked the datasheets of other FPGA parts, but the primitives used in this core (SERDES, IDELAY) should be generic to the entire 7-Series lineup, so there is wide use case potential. There are some limitations to how well a custom controller may be built. In my testing, the controller runs faultlessly at 125 MHz, with sequential read speeds effectively reaching 444 MB/s. A more exhaustive readme file is available in the github repository: https://github.com/someone755/ddr3-controller Any comments or contributions are welcome :) --jb
  5. Hello again after a long while, friend. I'm sorry for derailing another thread, I promise to keep this one short :) The hard external memory controller you mention is a strange concept, on two parts. First, I've read over XAPP721, a Xilinx note dealing with a DDR2 interface on the Virtex-4. It's a fascinating read that is still somewhat relevant in the age of the 7-series chips, as the primitives used--most notably the SERDES and IDELAY--are still around, and can be made to function with DDR3 (having managed to build a working PHY + controller from scratch for my Arty S7-50 -- not that I'd ever recommend doing this, but we discussed this once already, when I was only just starting). It's peculiar that Xilinx would go from an implementation in logic to a hard controller, and back to logic again through series 4->6->7. Second, if this Spartan-6 hard controller was anything like the more modern 7-series MIG, there's a whole host of modes and/or primitives that simply aren't documented anywhere in Xilinx's datasheets or application notes (as e.g. MIG uses phasers, but they aren't explained in docs like UG471 beyond noting their existence is reserved "for use with MIG"). Much of the modern MIG core is a black box, at least as far as the PHY is concerned. If @escou64 would like another example of using the MIG core, I would be happy to provide them my own application. It's simple, but it's worked for me (though, granted, my tests were only up to 30 minutes long, and didn't do any sequential accesses). If you ask me, nobody is ever prepared to use the MIG without stumbling. (On this note: just today I found Dan's--of ZipCPU fame, also a poster on this forum--attempt at deciphering the workings of MIG and DDR3 on OpenCores, and even he had many difficulties, and I'd consider him to be pretty knowledgeable in the realm of FPGAs.) For me, the example MIG project that Xilinx provides is unreadable. I will agree it might be prudent to explore, or perhaps even to delve into the MIG's code, if the project at hand is a serious implementation that will be used and worked on for years to come, but I found it simpler to handle it as a black box with such-and-such inputs/outputs that demand such-and-such operation to work properly. Maybe that's just because I didn't need the core to be robust, or intended to work with it beyond testing it out; I just had to be certain that the Arty board I have on hand could communicate with the included DDR3 chip. One note about Xilinx's documentation: After months of work and reading Xilinx's application notes and user guides, I must warn that some information might be missing or outright wrong in the manufacturer's own documentation. Most of it is fine, but some crucial details might not be (in my case specifically, there was one such instance regarding an input pin to the ISERDESE2 and OSERDESE2 primitives), and you'll go mad before you figure out that it's not your fault something doesn't work. Back to the original poster: I, too, can vouch for zygot's tutorial. I didn't find the need to fully complete it before making my MIG design work, but it's a quality write-up worthy, in my opinion, of more than "just" a digilent forum post. (God forbid this website is ever deleted!) To address your question specifically, I happen to have had a very similar issue: Make sure the FPGA chip you have can drive the frequencies you are requesting from it. I had a similar problem, with MIG refusing to work after a short runtime. I had been running the MIG core at 325 MHz (650 MT/s) in 2:1 mode. Eventually I stumbled upon the relevant timing datasheet for my chip (FYI: it's DS181 for Artix chips) and there I found "Table 16: Maximum Physical Interface (PHY) Rate for Memory Interface IP available with the Memory Interface Generator" (might be named slightly differently in your relevant document). In 2:1 mode, the MIG could only drive my DDR3L SDRAM at 620 MT/s (below my 650 MT/s requirement). I had to reconfigure MIG in 4:1 mode, where the chip is noted to be capable of 667 MT/s. This might be tangentially relevant to your question, or it might not apply at all; I haven't seen this behavior documented anywhere, but I experienced it firsthand. Going to 4:1 solved this bug where I could read and write a little, but then after a few minutes maybe, the core would stop responding. Also, documentation on the 2:1 MIG mode is sparse to begin with. Digilent's own reference manual was of little help, as it instructs the user to change a few options, but leaves most of them untouched, or "default". Apparently, between somebody at Digilent having written the RM and today (or rather, H2 2019, as that's the version of Vivado I'm using), some of the options in the MIG configurator have changed their default values. As luck would have it, the default in my case is the 2:1 mode, in which my Arty board cannot function. --jb
  6. Hi @JColvin thanks for the response. I'm using an Arty S7-50 board. Revision B, I think, if my eyes can read the silkscreen correctly. I've found a relevant procedure over on GitHub (link), where somebody has shared a working dump of the FT2232HQ EEPROM, and flashed it using `ftdi-eeprom` on Linux. Still, if you would be kind enough to forward me the proper procedure, I would be delighted. Thanks in advance --jb
  7. Hello from me as well @JColvin I am having trouble with the FT2232 chip as well, but am unsure what exactly the issue is. First it would only be detected in Vivado, but I got no serial ports in Device Manager. After removing the device from Device Manager, the serial port works, but now the board is invisible to Vivado. The EEPROM content, as exposed by the FT Prog program, is: Device: 0 [Loc ID:0x111] Word MSB 0000: 0801 0403 6010 0700 FA80 0008 0000 129A ....`........... 0008: 34AC 1AE0 0000 0000 0056 0000 0000 0000 4........V...... 0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0018: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0028: 0000 0000 0000 0302 0000 0000 0000 0000 ................ 0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0038: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0040: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0048: 0000 0000 0000 0000 0000 0312 0044 0069 .............D.i 0050: 0067 0069 006C 0065 006E 0074 0334 0044 .g.i.l.e.n.t.4.D 0058: 0069 0067 0069 006C 0065 006E 0074 0020 .i.g.i.l.e.n.t. 0060: 0041 0064 0065 0070 0074 0020 0055 0053 .A.d.e.p.t. .U.S 0068: 0042 0020 0044 0065 0076 0069 0063 0065 .B. .D.e.v.i.c.e 0070: 031A 0032 0031 0030 0033 0035 0032 0041 ...2.1.0.3.5.2.A 0078: 0036 0042 0046 0042 0032 0000 0000 DB39 .6.B.F.B.2.....9 Thank you in advance for your help.
  8. This was a very strange thing indeed. I tracked down a program from FTDI themselves called "FT Prog". After scanning for connected devices, this utility told me the Vendor and Product IDs of the FT2232H chip on my Arty S7 board, as well as its Serial Number and other properties. Since searching for devices in Windows' own Device Manager is not possible, I downloaded a program named "USBDeview", which again scanned all devices connected to my computer (in the past or present). I was able to search for and find my board using the serial number obtained from FT-Prog. Then I right-clicked the entry and chose the "Uninstall Selected Devices" option. After disconnecting and again re-connecting my Arty board, a COM port appeared in Device Manager as per usual. I hope this adventure of mine saves any future tinkerers the time I needed to track down and fix the issue. edit: Or perhaps not. The board then decided that the serial port was okay, but the other port used by Vivado was suddenly gone. I managed to use WSL2 to follow the instructions posted here to restore the FT2232 chip's EEPROM to some value that restores Vivado functionality to the board: https://gist.github.com/rikka0w0/24b58b54473227502fa0334bbe75c3c1
  9. I've searched around a bit for a solution to this problem I've encountered recently but to no avail. My Arty board is perfectly usable, it is programmable from Vivado and debuggable. However, I cannot communicate to the host computer via UART, as the FTDI chip apparently isn't picked up by Windows. The RX LED flickers just fine on the board, but my operating system doesn't seem to be aware of the serial port. In the meantime I am using a USB-UART Pmod that works just fine (detected by Windows), which is a workable solution, but I would prefer not using two USB cables per development board. Any advice is much appreciated.
  10. Very interesting board you have there. At the risk of going off topic, I'll point you to the fact that DDR chips have a DLL off mode that you can read more about in various datasheets. If DLL on mode for DDR3 modules supports clock periods of 3300ps and lower, DLL off mode allows for periods of 8ns and higher, the only condition is to satisfy the refresh period of 7.8 us. Of course MIG doesn't support this, but it's a nice thought experiment. And at least for the Arty S7-50, there's an actual PHY and controller built around it online (with proper DFI and AXI). Have a look at this project, the uploader claims it works with a 100 MHz clock. https://github.com/ultraembedded/core_ddr3_controller Should be much smaller than the entire MIG IP, though one can't vouch for high bandwidth operation seeing as the PHY in this project requires a DDR clock plus a clock that's 4 times faster (here it's 100 MHz and 400 MHz). Then again, 200 MT/s might be enough for your application? (I am also currently in the process of creating a similar controller and PHY for higher -- DLL on -- speeds, albeit any real result might take months, if I manage to reach one at all. The basis of the PHY ought to be SERDES and IDELAY blocks, just as in MIG or in any other DDR project, really.)
  11. Not to advance the discussion any further, but just to bring closure to my own guessings and ramblings, I'll jot down how I got my DDR3 working. If anyone finds themselves reading this months or years later, take this to heart: Read zygot's tutorial, at least the first part of it. There's enough wisdom there to nearly make this thing work. Second, read Xilinx's and Digilent's documentation. Digilent's mig.prj IP template might not be accurate (or indeed, usable!), but it gives several hints that one might otherwise miss, as they are NOT stated in the reference manual. (The refman states, for most MIG config options, to leave them at default settings, but apparently the folks over at Digilent forget how Xilinx changes these defaults nearly with every release.) What saved me from nearly giving up was noticing the fact that Digilent's design uses 4 bank machines (default is 2) and normal ordering of commands (not sequential -- as UG586 states, these "out of order" operations are hidden to the user, and both reads and writes seem to occur in the order that commands are given). These two details, together, removed the remainder of my problems, i.e. once enabled, the IP core finally had time enough to process my commands between writes (which, in all honesty, didn't happen in such quick succession that I would think of speed as an issue, at about 60 clock periods between successive operations, but apparently -- not for the first time in this thread! -- I was mistaken). In my case the example project is not runnable as Vivado 2019.2 reports some critical warnings and errors (naturally, how else) that I don't have much interest in solving. If my confusion about this IP as a newcomer isn't testament enough to the subpar quality of the example, then a rant from somebody more experienced, like zygot's, should be... Thanks to both of you for the help and discussion. I'll go add my findings to the thread I opened over on Xilinx's forums, too, in case a fellow confused soul chances upon it in the future.
  12. I'm from the school of thought that uses microcontrollers, since they're cheap, low power, simple to program, simple to debug. We, perhaps unfairly, avoid FPGAs whenever we can, despite having had some practice with HDLs (I remember starting out with Spartan 3). When we cannot, it means we're dealing with a data stream or data processing too fast or too accurate for good old PICs or ARMs. For the workflow we're taught (literally, this stigma against FPGAs comes from school, or more accurately, university, where the entire staff agrees with this sentiment), it makes sense to only use the smallest/cheapest FPGA/PLD we can get away with, then deal with the rest of the application in the microcontroller as per usual. That's the opposite end of the spectrum for you. :) For what it's worth, I've worked with Zynq in the form or the RedPitaya. The Linux functionality is nice, but in my opinion prohibitively expensive, and unnecessary unless you _really_ need Linux. A soft core or an additional microcontroller chip are much less expensive, and sufficient for most projects, in my opinion. I suppose the benefits and ease of integration might outweigh that cost (though it is of course difficult to transfer this price increase to the end user).
  13. Thanks again for this collection. I have a few questions, and I think it's best to start at nearly the very beginning: Here, you correctly note that the -187E part has a cycle time of 1.875 ns, which corresponds to a 533 MHz clock. You also note that your Nexys FPGA's maximum data rate is 800 MT/s, corresponding to a clock of 400 MHz. Why, then, would Digilent's reference manual recommend you choose a -125 part in MIG, when that one corresponds to an even higher clock of 800 MHz? Funny enough, in my case the tables are turned, i.e. the part I have is a 1600 Mbps one but the Digilent reference manual recommends I choose a 1333 Mbps part instead. This, at least, I can understand, because the datasheet (and likely the standard, too?) explicitly states that higher speed modules are backward compatible, i.e. a 1600 Mbps part is compatible both with 1333 Mbps and 1066 Mbps clocking. But if this were a case of downgrading to the part closest to the data transfer achievable by the FPGA (which evidently, it isn't), I'd expect the reference manual to suggest we both choose the 1066 part. Is this all just a matter of "choose the nearest part that MIG has available"? Or is there an inherent asterisk hidden in that statement that should read "...and hope it works well for you"? Would/should it be recommended to edit the memory timings by hand, especially in cases like the (my) Arty-S50 board, where one module is listed in the reference manual, but another module is delivered instead, or, like in your case, where the refman suggests you use faster timings than your memory chip supports? This last part isn't part of a question, but a detail that wasn't immediately obvious to me: "Trcd/Trp/Tcl and are in clock period units," yes, but to me, "clock period" would mean the period of the frequency that the memory is used at in the application (in the context of this tutorial, 2500 ps, for 400 Mhz). But, that is not the case -- the clock period used to calculate values to populate the "Timing Parameters" table in MIG must be the clock period of the memory (e.g. 1.25 ns for a 1600 Mbps part). Maybe I'm the odd one out here, as this is my first time working with DDR memory, and to everyone else this is as obvious as grass being green, but nowhere I've read is this explicitly stated.
  14. My apologies, after looking again I can confirm only DQ pins are swapped around. Still, my point about this not being mentioned anywhere in Digilent's constraints, reference manuals, schematic etc. stands... Here's a snap of the relevant datasheet pages for the Micron module vs the PieceMakers one. I've started reading the tutorial and already your suggestion to look at the FPGA part datasheet is invaluable. I never would have thought of looking there for what I thought was a generic thing -- In my mind the fact that a user-facing clock is one half or one quarter of the external DDR clock is unconnected to the frequency of the external DDR clock itself. (I'd imagined it as a clock divider or two, evidently that isn't solely the case.) As you rightfully note, however, the maximum throughput differs for 2:1 and 4:1 modes. The conclusion: Because the Arty board has a 100 MHz oscillator, the 7 series FPGAs cannot do more than 620 Mbps in 2:1 mode, and the S50 part limits the DDR clock period to 3000-3300 ps, one configure MIG in 2:1 mode and expect it to work on this board. The 100 MHz oscillator pretty much necessitates setting the DDR period to 3077 ps, or 325 MHz. Though in hindsight it does surprise me that I was able to get the core to work as well as I did -- At one point (before realizing bursts ought to be 128 bits), I was doing 64-bit bursts (wiring app_wdf_end high), and I was able to access an entire gigabit of memory without a problem (i.e. exactly half of the memory module). My original question is thus answered. In short, what I was attempting to do is not possible. I already have a question about timing regarding part 1 of your tutorial. I'll post it in that thread and ask somebody to delete it if it's too obvious, hehe
  15. Thanks a bunch for the reply. To start off, I am sticking with the memory part on my Arty board. I.e., the reference manual lists a Micron part, but the actual board came with a slightly different (undocumented by Digilent, even!) PieceMakers unit. Thus the timing values provided are slightly off, and the pinout of the address and DQ pins are also slightly different. You are correct that reading/writing takes two ui_clk periods here. Since it is a x16 device, and I'm in 2:1 mode, the app_data vectors are 64 bits in length, but since MIG-7 only supports BL8 for DDR3, the data written must be 128 bits. I think I've gotten to understanding this, as shown in my ILA screencap above, and as described in UG586 (and, partly, only for 4:1, here: https://support.xilinx.com/s/article/62568?language=en_US). I'll go ahead and read the linked tutorial, thanks for that! I've also looked at the example project generated by MIG, and have simulated it, but I didn't gather much from it. The traffic generator's output, to me, looks wholly incomprehensible, e.g. not changing address bits but instead writing two different values to the same address in memory. I know there has to be some logic behind it but I'm sure it could have been done more elegantly. The simulated waveform is so obfuscated with seemingly unconnected toggling of signals that I didn't even gather the bit about "two ui_clk periods per operation" until I had glanced over UG586 a few more times. That's my rant about the example design there, anyway.
×
×
  • Create New...