Differential PMOD Challenge

zygot · October 7, 2016

How fast can YOU transmit and receive 1048576 32-bit words though 12 inches of cable without an error using only 4 differential wire pairs? Are you a hot-shot purveyor of digital interfaces? The Differential PMOD Challenge is your ticket to finding out. Part FPGA project, part intellectual exercise, and part "show me what you got" throw-down, this project is for anyone interested in digital interfaces, good old-fashioned self-improvement, or who just can't pass up a challenge.

Requirements:
- A Digilent Genesys2 or Nexys Video FPGA board
- A Xilinx HW-FMC-105-DEBUG interface board ( available from Xilinx or Avnet ) to provide a programmable
    clock device
- Vivado or ISE 14.7
- A cable connecting the SourcePMOD pins to the SinkPMOD pins. All of the demo projects use TMDS_33 IOSTANDARD
    buffers and 50 ohm pullup resistors to VCC3V3 on all 8 data pins are required. While not recommended, you
    can use single-ended IO.
- Python 2.6 or 2.7
- pyserial 2.7

Version notes:

The latest version of code is pmod_challenge_R2.zip. Tester.py had issues fixed in the Anti_differential Challenge and (hopefully) appears below

pmod_challenge_R1.zip

pmod_challenge_R2.zip

Tester.py

D@n · October 21, 2016

Sounds like a fun challenge. I'd join in, but I don't have the necessary hardware. (Sorry.)

Dan

zygot · October 22, 2016

D@n,

The HW-FMC-105-DEBUG is available from a variety of sources for under $200 and not only provides an easy way to use many of the FMC connector IO but gives you a low jitter programmable clock source of 10-800 MHz. This project makes it programmable from a UART equipped PC. I realize that the majority of people seeing this will say "Whaaaat?" and move on without looking at it. A small group will read though the source and say "cool". The included Si570_INF can be used in any project to gain access to internal FPGA resources via a UART, no additional expenses required. There's a lot of useful stuff in the source code for a wide audience. Hint to Digilent; why aren't you providing similarly useful, easy to use, and educational projects for your hardware?

Anyway, I appreciate all comments, pro or con. thanks

Zygot

D@n · October 22, 2016

@zygot,

I know what I'd do first though: Send a digital signal that does nothing but toggle the lines at the highest rate that I could both get them to toggle and reliably measure the toggle on the other end. I'd do this with one line only, so as not to confuse things (yet). I'd use OSERDES and ISERDES components to do this. Success here would determine your highest on-line bit rate. Traditionally, I'd take it to be 1/4 the fastest clock rate of the ISERDES module, but that assumes perfect analog transmission which would remain to be determined. For a product, I'd back off of this bit rate some distance, but for a competition I'd try to get as close to it as I could.

After you can demonstrate that highest frequency, all that would remain would be line protocol. As for that, it may be time to improve upon 8b/10b. Perhaps a 32b/something?

Dan

zygot · October 22, 2016

D@n,

All of that sounds like a plan. You don't actually need any hardware to try out your ideas as the project supplies Vivado friendly testbenches to simulate your design.. What's missing from the simulation is a "channel" simulator. I could have spent a few more months expanding this project but there's no point if I'm the only one in the entire universe who's interested..... perhaps I might try thinking about a "channel" simulator as simulation is what VHDL was designed for. The objective wouldn't be to provide an extremely accurate representation of a 12" cable but be good enough to exercise the data sink and have it do something similar to what the actual hardware would do....

D@n · October 22, 2016

@zygot,

You'll have to forgive me, I'm more of a logic focused guy than a physics/hardware focused individual: what channel effects are you expecting? I would expect an uncertain path length between the four channels, pseudo-synchronized clocks (i.e. same speed, but wandering phase, which really means they aren't quite at the same speed ...), and ... I'm not certain what else.

What would you place in your "channel" simulator?

Dan

zygot · October 23, 2016

D@n,

I suspect that very few people have bothered to read through the documentation or source. That's a shame because the project is about much more ( and a lot more interesting ) than simply achieving a maximum data rate through a very limited number of signals. I don't want to get into details before working through my analysis but basically our "channel" involves impedances (R,L and C) and what happens at the places where there are changes of impedances along the signal paths. As current flow changes direction (LVCMOS_33) or stops and starts (TMDS_33) some energy get passed through these discontinuities and some gets reflected back to the source driver. This energy dissipates relatively slowly and causes undershoot, overshoot and reflections that are observed at the receiver. And of course there are issues caused by wire length mismatches ( different for n/p differential pair wire mismatches and different for pair to pair wire mismatches). I really encourage you to download the project and read through the documentation and source, especially the bi-phase implementation. I'm sure that you in particular will find it interesting. The project was envisioned as something that people with the specified hardware could experiment with. An accurate simulation of the wiring connecting the data source to the data sink is quite a complex undertaking.

D@n · October 23, 2016

@zygot,

Yeah, ok, ... so I downloaded your project, read through the instructions, and became even more convinced that 1) I didn't have the necessary hardware, and that 2) it was out of my price range.

Still ... it would be fun to challenge some of your pieces. For example, your documentation states that your crude CDR will struggle at speeds faster than fclk/6. It shouldn't be too hard to build a resampler that could/would build it's PLL and resample given a word of data sampled at fclk/2. Note the word "word" in that phrase. In order to use Xilinx FPGA's at their full speed, you'd want to use an ISERDES/OSERDES structure. This means that you would no longer be presented one bit per clock on each channel, but perhaps upwards of 14 bits per clock for a single channel. Now, using that word size, I think I could build a nice resampler that could/would lock onto a signal at fclk/2 and resample the results.

It'd be a fun project, but, like I said, I don't have the hardware to do it.

Yours,

Dan

zygot · October 23, 2016

D@n,

No argument here.... I purposely set the bar low to encourage some to challenge my initial results. Obviously, if we can do an SDR, we can do a DDR. Just as obvious is that if we are using TMDS_33 IOSTANDARD IO then we could to a real TMDS interface.... I will argue that even without hardware you can do your ISERDES design and get it to pass the behavioural testbench. DO that, publish your data source and data sink and I will run the test on real hardware... I can counter any excuses... the project is that good! Understand that the testbed is designed to run at a wide range of bit rates so keep that in mind as you do your design.

D@n · October 24, 2016

Ok, so let me get this straight: If I just deliver you straight verilog, never tested on real hardware, just verilog implementing a communications scheme--you would bend the rules enough to allow that as a valid submission?

Dan

zygot · October 24, 2016

Hey, there's no pot of gold in this challenge, just an opportunity for personal glory. Just follow the rules. Create a new toplevel file based on one of the 4 that I've provided ( I suggest that you choose the Nexys Video as you'll need a license to build a bit file for the Kintex device ) connecting your new data source and data sink to the testbed and get the simulation to complete the 16 word transfer without errors using one of the Nexys Video testbenches ( the testbenches really don't care about what kind of data interface is being used. I suggest that you try to create a bit file and get timing errors down to something reasonable ( that is part of the challenge ). Publish those three files here. If I can duplicate your simulation I'll try to create a configuration file for that board and run a test on my board with my loopback cable. I'll then publish the results. You are at a distinct disadvantage as being able to view the monitor signals with an oscilloscope might be very informative. The real world can be quite unkind. As you are the only other known fish in the pond it's the least that I can do. All that I ask is that you use the same testbed so that the software works and the test is consistent. The interface between the data sink and data source to Testbed.vhd is pretty straightforward. ( this reminds me of my first computer class where I spent hours typing into a card punch machine ( are you old enough to know about those? ) and submitted a bunch of cards for batch processing on a mainframe and then wait for a day or two to find out how good my code was... Ah, the bad old days...) If you turn my efforts into an embarrassment I'll feel obligated to respond in kind with another interface design of my own.

zygot · October 24, 2016

D@n,

It would be rude of me not to point out a few things to you since you don't have the project FPGA boards. As I mentioned, real life can be cruel to even well-thought out designs that are missing a few key details. The HDMI receivers ( read TMDS encoded SERDES signalling ) on all of the Digilent boards have special devices that include the 50 ohm pull-up resistors and do cable equalization to mitigate the nasty effects of the cable. The best that we can do is put our 50 ohm pull-ups a few inches away from the ISERDES pins; not optimal. A foot of any Ethernet cable will have SIGNIFICANT capacitive loading so trying to flip bits at a 500 MHz toggle rate is a pretty optimistic endeavour. Part of this project is understanding the Series 7 IO, part is understanding synthesis constraints to get a properly routed implementation that runs at the required clock rates, but a big part is understanding a little physics about what happens to the IO signals after they sojourn through a real world medium. Reading the Nexys Video user's manual and schematic as well as looking over the literature for the HDMI devices on this board is recommended reading.

D@n · October 24, 2016

Absolutely, but ... those things are out of my control. Hence, I only expect to build some components, and then to hear back how things work. I am not expecting to build top level files, 'cause I'll have no means to verify/validate them. My thought was that I might build ...

A circuit that outputs 14 bits per clock, which can then be placed into an OSERDES module. The circuit will have inputs for a reset line and a clock and ... we'll have to coordinate the rest of the input to this circuit--perhaps I can dig into your example and figure that out. My plan was to create 4 of these circuits, one for each wire. I had thought about feeding these circuits with a 32-bit value, and providing a busy signal so that you would know when the 32-bit value was accepted.
A second circuit upon receive that would input 14-bits per clock coming from an ISERDES module. This circuit would also have inputs for a reset line and a clock, as well as a (negotiable width, perhaps 32-bits?) output bus and an output strobe to say when those bits were valid.
A third circuit would be provided for the output channels to synchronize the four decoders together until one 128-bit word came out with a strobe on each clock. (If the strobe isn't valid, neither is the 128-bit word ...etc.) This circuit could then be used as part of a memory controller to write to a DDR3 SDRAM ... or, well ... how would you like to tell if the data is received correctly? It really doesn't make sense for me, the contestant, to score my own work, now does it? If you really want me scoring my own work, then I just achieved 80Tbits/second on my desk and if you believe that I've got some ocean front property in Arizona to sell you.

Without the hardware, it doesn't really make sense for me to build a controller for the programmable oscillator or, really, any other parts to the board. I would expect you to program the oscillator starting at slow speeds, and gradually speed it up until the design as a whole didn't work anymore. The speed where things then failed would be the "score" if you will of the communications system.

My thought was that I could transmit (and synchronize to) 7-bits per clock, and hope to use some error correcting coding on the back end to clean up any channel induced errors. A randomizer could be used to ensure proper channel loading, although if you are testing the channel with random bits it probably isn't required.

Dan

zygot · October 24, 2016

Ah... here is where we're about to have a disagreement.

You don't get to "score" anything, the Testbed.vhd module handles that. The whole testbed design is predicated on the assumption that an interface could be run at a wide range of bit rates and use a programmable clock to find out bit rates where the design might work for a given cable implementation. What sounds reasonable to me is that any new interfaces pass at least an HDL behavioural simulation using a fixed testbed interface. The testbed, if you look through the source code decides if data coming out of the data sink is correct. The first word out of the data sink and all of the next 1048575 words had better match the data sink LFSR or your test run fails. The testbed doesn't care about how the data interface operates, only that the first 1048576 words from the data sink match the sink LFSR which is run lock-step with the source LFSR. It's pretty simple and handles ANY data interface design, if connected correctly. You need to do some work on the toplevel because the ILOGIC SERDES is missing from the list of things currently supported. Now, if you want to design a complicated interface that runs at a specific frequency, go for it but be aware that there aren't an infinite number of output frequencies available from the Si570, and that's what determines the data source bit rate. You ought to run the SDR simulation and look around at some of the testbed signals to figure out what's going on. I've written a stand-alone DDR interface for the Kintex but that's not going to be part of this project. Nor am I prepared to complete a half-baked concept. The fact is that you can implement anything that you want and simulate the interface at one or more frequencies without any effort except changing the name of you toplevel entity in the provided testbench. If your design doesn't work as a behavioural simulation it's not likely to work in real hardware. Of course, an HDL design that works for a perfect "channel" in behavioural simulation won't necessarily cope with the reality of the real world and pass a single word using real hardware, I'm offering to test that last step, nothing more.

D@n · October 24, 2016

Oh, I wouldn't pass you anything that doesn't work in a "perfect" channel. That's just doing do diligence.

However, I might pass you something that doesn't work on a Kintex or Genesys board. Sorry, don't have them, so I would have no idea if they did or didn't work. Further, I only have the webpack license, so there's only so much I can build or simulate. Therefore, my submission will be Verilog--not a top level module.

Further, without your required hardware, I cannot be responsible for setting up your frequency controller (the Si570). Again, I don't have that hardware, so I can't be responsible for setting it properly. Neither will I have the opportunity to determine at what speeds the simulated code breaks down.

Therefore, since I know that whatever I send you will work on a perfect channel, I would need someone (i.e. you) with hardware to walk that channel from perfection to it's limit. Without that, I really can't participate.

Dan

zygot · October 24, 2016

D@n,

You just haven't spent the time to run a simulation of one of the provided designs; and you don't yet understand how the whole thing works.

I suggest that you build the Nexys Video SDR project as that's the easiest to understand. I'm pretty sure that the Web version of Vivado works with the Artix 200T device. Once you figure out how the testbed works you'll see that you really CAN simulate any new design... you DON'T need any hardware to do this. I've provided 2 data interfaces that show you how to hook up your data source and data sink to the testbed. If I'm wrong about the WEB version support of the 200T device then indeed you would be wasting your time. To find out, simply follow the steps in the README.txt file and try to create a bit file. If you don't get an error about needing a license file for the Artix device then you have no more excuses. Create your new data source and data sink so that they can communicate with the testbed FIFOs, do what's necessary to use the ISERDES and OSERDES in a new toplevel instantiating your new data sink and data source, and run a simulation instantiating your new toplevel. If testbed gets the correct 16 words out of your data sink then you've accomplished something. I could do all of this for you but then I'd be depriving you of any fun. If the behavioural simulation works then the next step would be to run a timing simulation on the routed design. AGAIN, NO HARDWARE REQUIRED. Yeah, there are a few more details to work out but nothing you can't handle. If you want to write a new toplevel module in Verilog that's fine though you'll have to also write a new testbench. Just use the Testbed.vhd file to complete the test setup.

Before anyone can participate in the challenge they have to have some understanding of how the basic test works.... that's why there are simulation testbenches. Forget about the SI570.. just set the fmc_clk0_m2c clock period to whatever frequency you want the bit rate, or in your case the pclock to run at in your testbench... or leave it as is. The project was designed to use a simulation mode so that the Si570 and the software are non-issues in development. ( this isn't quite true if you are using the IDELAY but you don't need to do that. ) I've thought this project out pretty well.

zygot · October 24, 2016

Vivado HL WebPACK Edition supports the Artix®-7 (7A15T - 7A200T), Kintex®-7 (7K70T, 7K160T)

zygot · October 28, 2016

Since the response to the challenge has been pretty disappointing so far I decided to do a tutorial with a DDR interface showing how to use the test setup and add a new interface concept. The original project had an SDR interface that ran at 36 MB/s and a Bi-phase level interface that ran at 26 MB/s. The first iteration of my DDR tops out at 40 MB/s which is equal to the maximum theoretical USB 2.0 data rate but without the software latencies. So far every thing's been pretty basic. I may have topped out the capabilities of my loop-back cable implementation at a 120 MHz toggle rate... time to quit? Just when things are getting interesting?

While I work on improving upon my 40 MB/s data rate anyone can follow along. No hardware? No problem. Stuck with the free version of Vivado? No problem. Just target the Nexys Video board and you can try out your ideas using the Vivado simulator. You WILL have to actually download the project and do some reading however... even free isn't no cost.

cheers

zygot · November 25, 2016

Release 2 has some minor corrections and a DDR data interface that works up to 57 MB/s on the Genesys2 (46 MB/s on the Nexys Video) through my 12" CAT6 cable. Anyone can experiment using free Vivado tools by targeting the Nexys Video board. You can verify my designs or your own using the Vivado simulator. Can anyone improve on that? Stay tuned...

pmod_challenge_R2.zip

D@n · November 26, 2016

@zygot,

Two quick questions for you:

Have you had any takers on your challenge? and
How fast have you been able to get the interface to work?

Just curious,

Dan

zygot · November 26, 2016

1. No one has indicated to me that they have downloaded, read through, or even tried out the project so far... as far as I know I'm the only one in the universe with the necessary hardware to implement the physical interfaces ( though I doubt that to be true ).

2. 57/46 MB/s is where my testing tops out so far. But I have a way to go before running out of ideas. The LFSR data generation in the test bed has complicated my approaches to compression so far. But this is an esoteric exercise with practical side benefits; not the least of which is to demonstrate that we can implement some fairly interesting hardware using HDL and without being dependent on tool versions, versions dependent scripts and IP that someone else controls. I might get an argument on the IP part of that statement as the project IS dependent on some HDL IP that I provide in the form of a synthesized netlist... but that IP is only peripheral to the project, won't change, and (so far) is tool version independent.

Personally, I've played around with a few projects that I've never used directly but have proved to be valuable in expanding my perspective on how to approach unrelated projects.

In anticipation of question 3: My answer to question #1 is not unexpected or completely discouraging and I'll probably stop working on the project when I run out of ideas or time or interest... sometimes we do things for (im)fame and (mis)fortune, and sometimes just because it's there taunting us.

D@n · November 26, 2016

Let's see if I have this right, 57/46 MB/s ... that's equal to 9.913 x 10^6 bits per second right? Xilinx boasts having *really* high speed transceivers. Shouldn't you therefore be able to get closer to 10^9 bits / second? If not even 10^10 bits/second? (I haven't checked whether any of the PMod ports you are using actually has any of these GT connections, or whether the GT ports even work with LVCMOS3V3, etc ...)

Dan

zygot · November 26, 2016

Well that's the rub isn't it. You can't just pick out some spare IO from any IO bank, route the pins differentially to a connector and call it a high-speed interface. PMOD connectors will never be appropriate for interfaces using true transceivers. For a good high speed interface you also need access to the dedicated pins that connect to the FPGA clocking fabric. But just because something isn't optimal doesn't mean that it's useless. PMODs are nice for the way that they have been used to provide added functionality with devices having simple low speed interfaces. To a degree the impetus for this project is that a board supplier is selling boards with connector interfaces having no clear indication that they will ever be functional. Those differential PMODS really aren't great for anything but since you have 4 differential pairs you no longer have 8 pins to work with as the pair signals interfere with each other when used as single-ended IO. So it would actually be better if they were replaced with normal single-ended PMODs. I will say that these differential PMODs don't irritate me near as much as, say, the TUSB1210 OTG USB interface on the Genesys2 board; something that I can't see ever being useful.

If you've done years of design and think about it toggling 8 pins at 150 MHz with sub-optimal termination, layout, pin assignment etc, etc. and doing something useful while using only FPGA IO driving and receiving signals through 12" of cable isn't all that shabby. At least that's the view from here.

It might be time for Digilent to consider offering an optional connector standard with a few more pins using a physical connector having better signal integrity characteristics... as long as users can create their own boards using inexpensive PCB design tools to take advantage of them.

zygot · November 26, 2016

Oh, and I'm not sure what calculator you're using but 57 MB/s is 456 Mb/s. Certainly not true transceiver level. But consider Gigabit Ethernet... 4 wire pairs, 1000 Mb/s, dedicated PHYs, complicated protocols, system latencies, etc. USB 2.0 is much the same at 480 Mb/s maximum peak data transfer but you'll never get much better than 40 MB/s sustained throughput ignoring latencies and restricted to very long data transfers like the 4 MB in the PMOD challenge. So putting things into perspective the simple DDR LVDS33 interface provided in the challenge might be a better choice....

hamster · December 6, 2016

I'm late to the party, but...

...given that the Nexys Video can transmit at 435 MB/s (for 1080p video) using the standard SERDES pins, 4MB should take about 9.6ms - this is using three data pairs plus a clock pair. Sending the clock to the sink would be the way to go, as it avoids the need for clock recovery, (where you really need to use the transceivers)

However, PMOD connectors are not very good for very high frequency signals - I once managed to get 500Mb/s through 0.1" connector and 200mm jumper wires, but it wasn't really a properly engineered solution. You would be very, really lucky to get 50MB/s through each pair.

Differential PMOD Challenge

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived