DDR3

nes · May 28

I am using MT41K128M161JT-125K DDR3l memory and arty a7 35t fpga chip for image proccesing. I have some question regarding memory interface.

1. Can we interface dual port ram with arty a7? What will be the part number of dual port ram?

2. How can we MT41K128M161JT-125K DDR3L memory for simultaneous read and write operation?

We are needing dual port ram to store >1 frames data (640x480) to do image processing for that we have to write & read at same time. I have gone through some algorithm for all I have to >1 frame data,I don't know if we can do it by any other algorithm without storing whole frame data. Also, I want to know I can use same DDR for two operation for simultaneously.

Thanks

D@n · May 28

The DDR3 SDRAM on the Arty is not dual port. Sorry. If you need to both read and write the memory, then either reads or writes will need to wait while the other is active. On the other hand, if you are only doing 640x480, you should have plenty of clock cycles to multiplex the interface. Even better, the interface is wide--so you should be able to read/write many pixels per clock cycle--or up to 64b per clock at a (rough) 81MHz clock rate.

The other issue to be aware of is that DDR3 SDRAM access is ... challenging. It tends to be so challenging that most folk use Xilinx's MIG controller for it. The MIG exposes one of two interfaces to your design--either AXI or another interface often called "native". I have most of my experience using the AXI interface. Using AXI, you can read and write pixels from and to memory, but you will need to style your design around the limitations of the memory. It's not going to be a one-clock cycle read every time you need memory. For example, I'm looking at a design that takes a rough 25 clock cycles to read from DDR3 SDRAM going through Xilinx's MIG. So typically, you'll want to burst requests to and from the memory at a rate much faster than your pixel rate, and then buffer things in between bursts. This makes for a different design architecture, so plan ahead.

Dan

nes · May 29

Thank you Dan. I also want to know that can I use same DDR for two different operation working at different addresses to run simultaneously like I am reading some data from DDR after acquisition on this data I want rewrite this data to different address while reading operation is running parallely. Also, please suggest me if you know any dual port ram part which can interface with arty a735t?

Thanks

Edited May 30 by nes

D@n · May 29

Yes/no. The DDR3 SDRAM device itself can only handle one operation at a time, whether read, write, or refresh. Typically the MIG controller plus AXI bus will handle any "sharing" issues by time slicing your requests, but the actual memory chip will still only ever handle one request at a time. Yes, DDR3 SDRAM requests can be overlapped. Whether or not the memory controller allows that, and whether or not the interconnect allows that, is another question.

Let's walk through the performance you might expect. Let's say you run this memory at 81MHz. (Your memory speed will be limited by the IOs on the Artix, not by the memory.) Your memory appears to have a 16b bus width, which means you should be able to top out at 128 bits (memory width in pins times 8) of transfer to/from the memory per clock cycle of throughput, although I might suggest expecting a 25 clock cycle latency. That's a rough 1.2GB/s. Hence, pushing N 128b words into the device will require (N+C) clock cycles, where C is the 25 clock cycle latency. (Reading is slightly slower than writing, but we can pretend the latency is the same for discussion.) It's a bit worse, however, since the controller will have to take the memory off-line periodically to issue refresh cycles. I think I recently read that this would increase your transfer time to 4/3(N+C) cycles, but I don't have measured numbers to back that up and this seems kind of high.

To give you an example of what you might do with this, I'm currently using a Nexys Video design to capture SONAR data, Fourier transform it, write the results to memory, and then read them back again to create a falling raster display--something like this project. I'm also capturing images leaving the board for the HDMI display to a QOI image compression algorithm, and then writing those captured images to memory. A CPU can also run instructions from this memory. An SD card controller also has a DMA which can autonomously move blocks of data to and from this memory to support the CPU. Now, as the engineer, I need to make sure there's enough memory bandwidth to support all of this activity.

There are two parts to the engineering involved.

On the one hand, you'll want to know that the device has the capacity to handle what you need. I recently built a 4x4 10Gb Ethernet switch, and every path through the switch went through memory. Early on in the hardware development of this project, I needed to answer the question of how much memory throughput that would require? Then, did the maximum DDR3 SDRAM bandwidth account for that? Ideally, a 4x4 10Gb switch should have the ability to move 40Gb/s into memory, and again to take 40Gb/s out, so you want a throughput of 80Gb / second or so. This then drove our hardware requirements, and necessitated purchasing a bigger memory chip. You'll need to do this kind of calculation for your algorithm, to know if it will even fit. As another example, I have a customer asking for an image processing algorithm to be applied to a video stream. Let's say it has 1920x1080 pixels, arriving at 60 frames per second, where each pixel is initially sampled at 8b when entering the FPGA. The particular algorithm this customer wants would then need to write that data to memory (1 transfer), read it back from memory (+1 more), as well as reading and writing a scratch pad with twice the bit width (+2 + 2 transfer widths). That will therefore require moving this image data in and out of memory at 6x the rate of the incoming data stream. Does the memory have the bandwidth to support that? That's where our engineering studies are currently at. That's just the first part of the task--rate estimation. You'll want to do that early on, so you can know what the memory might be able to accomplish for you.

The second task is building the components you'll need for your task. Many individuals here on Digilent's forums will recommend you use pre-built Xilinx IP. Xilinx offers a lot of it, and perhaps it might work for your purpose. That's great until you either 1) have to debug their IP, 2) end up with a need the Xilinx IP can't handle, or 3) want to switch to a non-Xilinx FPGA. As a result, this isn't my business model. Instead, I write my own IP components. That means I write IP that can be used to copy image data to/from memory. (You might find some of that here.) In this case, there are some rules to consider. First, I like to say the bus is like the bathroom in a crowded house. Get in, do your business, and get out. Make your request of the bus, and get off the bus as soon as you can. Buffer requests as necessary. Use FIFOs. When reading from memory and your FIFO is half empty, make a request of the bus that it be filled back to the top--that way you can keep what ever is downstream happy with a continuous flow of data. Same goes for writing. Once the FIFO gets half full, dump the data to memory. Get on and get off the bus. When reading and writing, beware of coherency issues--there will be delays between when requests are made and when they are completed. Moreover, you'll want to pack that image data. Got 24b of data per pixel, and a 128b bus? Pack the data. Use all the bits in the bus word. Perhaps your image data will cross bus words, so that one 24b pixel will cross across two 128b words. Get over it. Plan on it. This kind of throughput is something you'll need for all of your applications, not just this one. Do your job, do it once, do it well, and you can then reuse components over and over again.

Most of Xilinx's IP will use AXI video streams for the purpose of moving image data around. This is a great protocol for this purpose, although it has its limitations. In this format, TLAST marks the last pixel in a line, TUSER marks the first pixel in a frame. VALID is used to indicate the presence of pixel data., READY is used to indicate the downstream logic is READY to handle it. This is a great protocol--as long as you can guarantee video data will maintain its rate requirements. READY should never be held low for more than a couple of cycles between lines when processing video data. Only the downstream display driver should hold READY low for any more than that. VALID has the same requirement, with the exception being when it comes from a video (not memory) source. Basically, the protocol allows you to shoot yourself in the foot. Be smart with it, and you'll have a chance.

I could go on. There's a lot to be said here, but I think I've more than answered your question at this point.

Dan

Sign In

DDR3

Recommended Posts

nes

Link to comment

Share on other sites

D@n

Link to comment

Share on other sites

nes

Link to comment

Share on other sites

D@n

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity