A custom DDR3 controller for the S7-50 board

jb9631 · July 14, 2022

Hi,

This is my attempt at building an alternative controller to Xilinx's MIG. Its FPGA utilization is much lower than MIG's, and it enables support for DDR3's optional "DLL disabled" mode, for frequencies <125 MHz. It also enables read calibration, which, according to my research, is rare for a custom/open-source memory core.

I haven't checked the datasheets of other FPGA parts, but the primitives used in this core (SERDES, IDELAY) should be generic to the entire 7-Series lineup, so there is wide use case potential.

There are some limitations to how well a custom controller may be built. In my testing, the controller runs faultlessly at 125 MHz, with sequential read speeds effectively reaching 444 MB/s. A more exhaustive readme file is available in the github repository: https://github.com/someone755/ddr3-controller

Any comments or contributions are welcome :)

--jb

JColvin · July 15, 2022

I (and another co-worker of mine for familiar with controlling DDR) will need to check this out.

I'm glad to see that you also benefited from zygot's very detailed post on using DDR in an HDL design.

jb9631 · July 16, 2022

Hi again and thank you! In hindsight, it's not much, and much of the logic could be done much better. In my defense, I'd only just started learning Verilog when I started this project. :) The core isn't vey complex, and nearly half of it are just primitive declarations. The PHY mostly--but not completely--follows Xilinx's own design (for DDR2 on Virtex-4), described in better detail than I could manage, in XAPP721: https://docs.xilinx.com/v/u/en-US/xapp721

A related memory core that I've linked to before on this forum, that's focused solely on DLL disabled functionality (<125 MHz) could also give some insights into putting together a PHY: https://github.com/ultraembedded/core_ddr3_controller (though this design ignores both write and read calibration).

Most of my time was taken up by having to sift through endless Xilinx's FPGA and generic DDR3 and application notes and piecing together a system that would produce some results. Zygot's guide was invaluable to me in getting Xilinx's MIG to behave, but I'll admit I didn't read through its entirety, so I can't comment on how usable his work might be in actually designing a memory core (versus using one).

Fair warning, though: Two of the major issues of my design are (1) that write leveling is impossible to implement, leading to bad data being written at higher frequencies, and (2) my lack of knowledge with regards to timing constraints, since the address and command bus, and the data line, and data strobe line, are technically crossing clock domains outside of the FPGA. I'm sure you can imagine the implications of ignoring time constraints (mainly no help from Vivado's timing analysis). I discuss the problem of write leveling in more detail in the project's README.

I also have a very simple testbench made that helped me visualize when and how the signals toggle. (Many signals aren't clocked in the same domain, so ILA is of limited use.) If anyone would like access to that, it's all currently very unhygienic code, but available in the core's sister repository, linked under the readme's "Example project" header.

Good luck to you, your coworker, and any other soul brave enough to sift through this creation. If there are uncertainties, I am eager to offer clarifications.

--jb

Edited July 16, 2022 by jb9631

zygot · July 16, 2022

Nice work and good presentation. I whole-heartedly applaud such efforts, especially when they are published with useful citations. Making it easier to find similar efforts encourages like-minded experimentation. Beyond the very practical benefit of an self directed educational exercise being able to eschew vendor IP with HDL sources that you understand is an invaluable asset. In some cases, vendors simply don't care about making their IP usable, or don't want to highlight faulty designs in their hard memory controllers.... so going your own way is a necessity. I've run into just this scenario for a board using a Cyclone V part and LPDDR2. The current tools specifically support the board but the IP is completely useless for a user wanting to have a high performance LPDDR2 design. The IP requires scripts that don't work on Windows, the hard external controller doesn't behave as the sorely incomplete documentation suggest that it should, etc. etc. I am unaware of any published design example that can be replicated demonstrating that the board is capable of burst read or write operation. The effective result is that a board that should be perfectly well suited to a large range of project implementations is rendered unusable for many of them because the external memory can't perform as advertised using the vendor's tools.

External memory IP isn't the only functionality that FPGA vendors use to compete with their customers. Ethernet, transceivers and just about any high performance interfaces are also examples where users might find it useful, or necessary, to develop their own IP in lieu of the vendor's offerings. Edited July 16, 2022 by zygot

jb9631 · July 17, 2022

Thank you for the kind words. Sadly while I do have intimate knowledge of the project, I cannot in good faith recommend it be used without further work w.r.t. timing. Still, I think it is one of few examples of open experimentation with SERDES and IDELAY blocks available online. The documentation woes you mention, I've come to known all too well in my time making this. I've alluded to this once before, but since it's relevant to this thread specifically, I'll go ahead and cite the worst offender:

I found out the hard way (read: after literal weeks of experimentation) that the ISERDESE2 OCLKB pin cannot be driven by a MMCM/PLL. You'd think MMCM outputs with controlled 180° phase shift would be better than a local inversion but apparently I'm a fool for believing this. Naturally, this isn't mentioned anywhere in Xilinx's docs, and you're left to find out about it yourself, if you happen to figure it out at all! To add insult to injury, UG471 (the document that should state this) says that the OCLKB pin "is shared with the OSERDESE2 CLKB pin." The joke here is that OSERDESE2 does not have a CLKB pin.

I'd say your experience with both FPGA vendors goes to show that the grass on the other side isn't greener. Perhaps both are a brown-ish hue.

One of the big takeaways from putting together a low-level project like this is that, though I may buy an FPGA chip, and own it physically, what I may do with it is purposefully limited by the manufacturer with their tools and their documentation. Pleas for Xilinx to document the PHASER primitives are public, and often have replies from Xilinx employees, yet Xilinx refuses to do so. Instead, documents like UG953 show page upon page of this same text, verbatim:

No instantiation, no inference, no use, and no modification. Only to be used by Xilinx. The only explanation I can think of is a sort of stubbornness to force users into using their IP. Because without primitive documentation, building functioning IP cores becomes impossible. The only way to run a DDR3 SDRAM chip with a 7-series FPGA (at decent clock frequencies, anyway) is through MIG. Perhaps I'm being too negative, but I for one am an advocate for openness.

To finish this post on a positive note, I will say that regardless of the hurdles encountered (and some, justifiably given up on), there is a lot to be learned from such an experience. The end result might be far from perfect, but the knowledge gained is invaluable, as it covers topics from transmission lines and data integrity and data eyes, to the workings of high speed interfaces and some "gotchas" within high-speed FPGA designs. Not all of this knowledge is directly applicable (I cannot, for example, stick an oscilloscope probe onto one of the SDRAM chip's pins), but it's a building block that might come in use in other contexts.

In a sense, I am grateful that whoever put me up to this challenge had little clue what exactly the challenge was, having said at one point that they could complete this project within two weeks (implying triviality), but choosing to delegate the task to me instead.

Edited July 17, 2022 by jb9631

zygot · July 17, 2022

15 hours ago, jb9631 said:

In a sense, I am grateful that whoever put me up to this challenge had little clue what exactly the challenge was

How's that ditty go.. "eyes wide open"..? In a sense the challenge turns out to have nothing to do with external memory controllers. There are things that you can learn from textbooks. There's things that you can learn in school. There's a lot more that you can learn from old battle worn engineers who've long fought the information wars with suppliers who claim to be your company's "partner". Product support doled out in tiers according to how important your company is viewed in terms of your vendor's willingness to provide key support predates programmable logic by decades. It's possible to live an entire career in the magical world where electronic components work as expected and life is easy. If the components are high performance or complex, it won't take long for many engineers to find themselves at the mercy of a vendor who isn't forthcoming with information vital to a project's success because you happen to be insignificant to their market objectives. Anyway, without getting too expansive on a subject near and dear to my experiences, let's just say that if you want to do extraordinary things with complex silicon devices, you had better be prepared to find a way around unexpected obstacles thrown in your way. There are not many silicon devices as complicated as modern FPGAs and the connected-to-the-hip sibling that are the tools. Understanding that what you see ( or even read ) is not exactly what you get is an important part of this obstacle evasion skill set. Don't get discouraged... sometimes you get lucky ( I once worked for a very small startup that the big boys, more accurately someone working for them, thought had future market interest, and saw first hand an elevation in informatinon tier rating ), and sometimes you just lose a game in the competition between vendor and customer. There's almost always a path to getting to where you need to be though.. the important part is how long it takes you to get there.

From what little you've posted about yourself I'm guessing that you are no grizzled old-timer... but you do seem to have the hard won wisdom of one. I've really enjoyed your posts, and perspective... perhaps with more than a few winces of empathetic pain that resonates loudly. Extraordinary.... I see good things for your future. Keep asking questions. For everyone's benefit keep posting the journey.

jb9631 · September 11, 2022

Hey all, quick update on this interface's progress.

tl;dr: I've made a change to the PHY component of this interface, and, in my testing at least, it is now possible to communicate with the DDR3 chip at much higher frequencies than MIG allows. My Spartan chip "only" allowed me to test up to 928 MT/s, where communication was still robust.

After some thinking, tinkering, and testing, I've come up with a solution to the clocking instability at higher frequencies. I still don't have proper timing constraints, but the problem before seemed to be an obvious delay between the address and command lines that were routed from the logic to an output buffer directly, and the data bus (DQ, DQS, DM), which was processed through OSERDES. The trick I've employed now, at a slight cost to FPGA area usage and write/read latency for the end user, is to route all of the mentioned signals (cmd/addr/data) through OSERDES.

As far as I'm aware, the IO blocks manage data transfer from the slow logic clock domain to the fast memory clock domain themselves, and the external lines, clocked by the fast clock, are inherently in sync. Relying on this seems to work well: I've done some testing by raising frequencies, but I hit a stop at the maximum BUFG frequency of my Spartan 7, 464 MHz, at which point my controller still works. For reference, Xilinx's MIG only works up to 333 MHz. How's 40% for a sequential performance increase? :)

Write leveling is still impossible, so using more than one SDRAM chip is out of the question, The DQ lines in a multi-chip arrangement go to each chip individually, but the command bus employs a fly-by topology, introducing skew. I'm not too worried about this, as the MIG IP on my Spartan doesn't use write leveling, either.

That's all from my end, at least for the time being. Anyone reading this is of course welcome to contribute, ask, test etc. If you happen to have a faster 7 series FPGA connected to a DDR3 chip, and are willing to do some testing at higher frequencies (1066 or even 1333 MT/s), please do let me know.

Edited to add: While the interface seems to work at 928 MT/s, this means that the bus frequency is 464 MHz, and, because the interface has a 2:1 PHY to logic clock ratio, this means the internal logic runs at 232 MHz. Designing (top) modules that run at this frequency on a -1 speed grade Spartan 7 is not trivial. For example, a 128 bit comparator will fail timing at this frequency unless pipelined. There are 3 solutions to this that I can think of, listed here in what I deem to be increasing difficulty/complexity: (1) Simply pipeline your design where it meets with the interface's high frequency. (2) Employ one of many CDC techniques, such as an asynchronous FIFO, to enable your modules to run at different frequencies than the interface. (3) Kindly ask Xilinx to provide us the documentation for the SERDES primitives in the "MEMORY-DDR3" mode, as currently this mode is not supported, and is the only one that will result in a 4:1 PHY to internal clock ratio.

I've seen that there is a company called Xylon that sells a DDR3 interface for 7 Series devices that operates with a 4:1 clock ratio, but the license for that is 3500 € yearly, the delivered product is just encrypted VHDL, there are no reviews I could find of it, and it is still twice as large as my own work.

I also remember a user on here asking about running MIG on an Arty S7-25 board and having had difficulties due to the IP core's FPGA area usage. I hope that it isn't against some unwritten rule of forum bon ton to tag @Mathias despite their inactivity.

Edited September 18, 2022 by jb9631

Sign In

A custom DDR3 controller for the S7-50 board

Recommended Posts

jb9631

Link to comment

Share on other sites

JColvin

Link to comment

Share on other sites

jb9631

Link to comment

Share on other sites

zygot

Link to comment

Share on other sites

jb9631

Link to comment

Share on other sites

zygot

Link to comment

Share on other sites

jb9631

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity