Jump to content
  • 0

ADP3450 effective sample rates


AbbyM

Question

Hello,

I have been using the ADP3450 with a Raspberry Pi 4.  I am running a C++ script to pull data from 3 channels at a target sampling rate of 6 MSPS, using 16-bit signed data type. Using the record acquisition mode and reading in a loop. But I am getting effectively about 1 MSPS no matter how fast I set the ADP's sampling rate.   I am using a USB cable which I have measured is getting 38-39 Mbps.  This should equate to ~5 MSPS maximum, so I don't know what is happening to get a much slower rate.  My code is pasted below, the initialize function and the querying function. 

Also, when setting the sample rate to 1 MSPS just to verify, I noticed the USB transfer rate is only 8 Mbps.  But at or above 4.5 MSPS, the USB transfer rate is maximized at 38-39 Mbps.   At 8 Mbps, this only equates to 0.5 MSPS, so why does the device send data at half the desired rate in this instance?  

e.g. the file size at 6 MSPS is 830 MB per channel.  but for 1 MSPS it is 190 MB when it should be 600 MB

Thank you,

Abby

Code:

void initOscope(void) {
    char szError[512] = {0};
   
    cout << "Entered initOscope()\n"; //DEBUG
   
    printf("Open automatically the first available ADPro device\n");
    // if(!FDwfDeviceConfigOpen(-1, 1, &hdwf)) {
    if(!FDwfDeviceOpen(-1, &hdwf)) {
        FDwfGetLastErrorMsg(szError);
        printf("ADPro device open failed\n\t%s", szError);
        return;
    }
   
    //enable channels
    for(int c = 0; c < NUM_CHANNELS_USED; c++){
        FDwfAnalogInChannelEnableSet(hdwf, c, true);
    }

    //set pk2pk input range for all channels
    FDwfAnalogInChannelRangeSet(hdwf, -1, 5);
   
    //acquisition mode
    FDwfAnalogInAcquisitionModeSet(hdwf, acqmodeRecord);

    //sample rate
    FDwfAnalogInFrequencySet(hdwf, SAMPLE_RATE);

    FDwfAnalogInRecordLengthSet(hdwf, 0);

    // wait at least 2 seconds with Analog Discovery for the offset to stabilize, before the first reading after device open or offset/range change
    Wait(2);

    // start
    FDwfAnalogInConfigure(hdwf, 0, true);

    cout << "Exiting initOscope()...\n"; //DEBUG
}

...................................

 while (stopRecord == false && newFileFlagRf == false) {      
            // get the samples for each channel
            for(c = 0; c < NUM_CHANNELS_USED; c++) {      
                if(!FDwfAnalogInStatus(hdwf, true, &sts)) {
                    cout << "FDwfAnalogInStatus() error\n";
                }
                if((sts == stsCfg || sts == stsPrefill || sts == stsArm)){
                    // Acquisition not yet started.
                    continue;
                }
                FDwfAnalogInStatusRecord(hdwf, &cAvailable, &cLost, &cCorrupted);

                if(!cAvailable) continue; //only get data samples if there are any available

                if (c == CH1_REF) {
                    FDwfAnalogInStatusData16(hdwf, c, sampleFileBuf1, 0, cAvailable); 

                }
                else if (c == CH2_REF) {
                    FDwfAnalogInStatusData16(hdwf, c, sampleFileBuf2, 0, cAvailable); 

                }
                else if (c == CH3_REF) {
                    FDwfAnalogInStatusData16(hdwf, c, sampleFileBuf3, 0, cAvailable); 

                }

FDwfDeviceClose(hdwf); //close the device

}

Link to comment
Share on other sites

Recommended Posts

  • 0

Your call to FDwfStatus() should go before the loop over the channels. A single call to FDwfStatus() will retrieve data for all enabled channels. The way you do it now, you make the call for each channel separately, thus dropping many samples.

So instead of what you're doing now: call FDwfStatus(), then if the 'status' returned indicated samples are available, loop over each channel and call FDwfAnalogInStatusData16() once per channel.

This may solve your performance issue, too.

Link to comment
Share on other sites

  • 0

Thank you much, that way improved the throughput for me!   I overlooked that because most of the examples use only 1 channel.

At 1 MSPS am getting right around the expected file size for a 5 minute run, 570 MB per channel.

However, I am still under the target of 6 MSPS.  When that sample rate was used, the effective rate in the saved files is about 3.5 MSPS.  About 2GB files per channel for 5 minutes.

Any other ways to increase the rate? 

I am using Standard Mode on the device. 

Link to comment
Share on other sites

  • 0

Hi @AbbyM

if you fixed that issue, the code looks pretty close to optimal to me.

So to find a way forward, you first will need to determine what is bottlenecking your system. The obvious candidates are the USB-2 bus, the saving of data to file, and the Raspberry Pi itself.

To check those one by one, I recommend that you try the following ...

(1) Connect the ADP3450 via gigabit Ethernet rather than USB. Gigabit Ethernet offers a lot more bandwidth -- Ethernet can accommodate a bit over 100 MB/sec in terms of "real-world" bandwidth, while with USB-2 you'd be happy to get 35 MB/sec.
(2) Disable file saving in your program, and just update a counter with the number of samples;
(3) Run the program on a more powerful computer (eg a system with a fast Intel or AMD CPU)

In general, processing 6M samples/s on three channels is not fully trivial. You may need to split your program in multiple threads, with one thread talking to the ADP3450 and forwarding the data obtained to another thread that handles the saving to disk.
 

Edited by reddish
Link to comment
Share on other sites

  • 0

OK cool.

I did try using Gb Ethernet before but it was even slower than USB.  I am not sure what is wrong there but it must be some low level setting in the Raspberry Pi.  Any thoughts?

I did have multithreading at one point, I can reimplement that to see if it helps.

Thanks,

Abby

Link to comment
Share on other sites

  • 0

About your previous experience with Ethernet: did you confirm you had a proper 1 Gbit link? (use e.g. the "ethtool" command-line tool to confirm). Because lower performance with a gigabit ethernet link vs USB2 would be very surprising.

At 6 MS/sec, 3 channels, 16 bits per sample, you would need 36 MiB/s of bandwidth just for the samples, not counting any overhead. That's impossible to sustain even over a dedicated USB2 bus. So from first principles, gigabit ethernet is not a choice -- it's a must, for your requirements.

About the threads: before re-implementing that you first need to properly diagnose the issue I think, and that can easily be done by just receiving data from the ADPro without writing it, at all, in a single thread. First point of order should be to see at what level of performance you can get that. If you can't get that to work at 6 MS/sec, adding threads won't help you.

Link to comment
Share on other sites

  • 0
Some thoughts about data rates.

In my , admittedly limited, experience developing hardware applications for the Raspberry Pi 3 and 4 I've concluded that for short amounts of data transfer... it really doesn't matter from what interface to what resource, data rates can be pretty good. For sustained transfers, things get slow pretty fast. My impression is that the limitations are more of a CPU problem than an OS problem. I've connected FPGAs to RPi 3 and 4 boards via SPI and the unofficial "smi" interface. That's using DMA and no other applications running other than an executable in a command terminal. I might be wrong but my sense is that this just isn't a high performance platform for moving lots of data around between memory and peripherals.

As for Ethernet figuring out sustainable data rates requires some experimentation. The absolute peak data rate for 1 GbE Ethernet for a point to point connection is 125 MiB/s. This is of course refers to every byte being communicated. Ethernet is packet based with substantial overhead for packet related information. What's important for someone wanting to send a lot of data is the payload size. This is different for different packet types. The more payload per packet, the higher the data rate efficiency and the higher the amount of user data that can be sent. So far we haven't considered all of the other packet types that create a workable Ethernet communication channel, and steal some of that bandwidth , or the OS considerations for maintaining data flow. What I can say with some degree of confidence is that if you only use jumbo packets you'll get the best possible user payload data rate for a given system. Something on the order of 30 MiB/s is probably a reasonable expectation for maximum sustainable data rate, not 100 MiB/s. But this is highly dependent on the platform hardware and software that running along side your data application. If you can do away with all of the Ethernet overhead and just blast data through an Ethernet pipe you can get over 120 MiB/s going both directions simultaneously. That's not using standard packets, a CPU, or a multi-tasking OS.

USB is a different animal. It also is packet based but there are a lot more ways to achieve dreadfully low data rates compared to what you might expect. Not sending data in blocks compatible with the USB native packet length is a good way to slow down USB. There's a lot more software overhead with USB. I've done a lot of FPGA USB designs and can say that performance on Linux Hosts is not the same as on Windows Hosts, and that x64 is not the same as arm architecture hosts.

Basically, I'm trying to say that you can't start with a clock frequency and extrapolate data rates for any particular application easily without a lot of experience and experimentation. Edited by zygot
Link to comment
Share on other sites

  • 0

Hi @zygot

In my experience it is not very hard to sustain point-to-point rates over gigabit Ethernet of 100 MB/s, at least on a modern Intel/AMD system, just using a regular OS, plain TCP, and standard 1500-byte frames. Your 30 MB/sec number sounds really pessimistic to me.

I am not sure what the ADPro 3450 can do. It's a Zynq with a somewhat dated ARM, and I am unsure if it is fast enough to keep up.

EDIT: I just checked the numbers. On gigabit ethernet, a transmitter can emit a single maximum size regular packet once every 1538 clocks (at 125 MHz). Assuming IPv4 and ethernet framing, 1460 of those cycles transfer a TCP payload byte. That gives a theoretical maximum TCP-over-IPv4-over-gigabit-ethernet bandwidth of 125 MHz * 1460/1538 == 113.164 MB/sec.

A good ethernet cable will introduce negligible bit errors (I once did a sustained FPGA-to-PC test at max framerate for 24hrs with zero FCS errors), so the question then becomes to what extent the sender OS can push out this amount of data, and the receiver can keep up processing it. I am now actually curious about that so I will do some proper benchmarking over the weekend :-)

Edited by reddish
Link to comment
Share on other sites

  • 0

Ok thanks for all that great input!

I just queried "ethtool" on the Pi 4 Model B I'm using.  It appears to say its setup for GbE so perhaps there is some other setting that is limiting it?  Attached the output.

I also tried another tool called "nethogs" and it shows the ADpro outputting between 8-15 MBps, hovers mostly around 12-13 MBps while I run the program.  So that would explain why the throughput is slower than USB right now, which I measured to be about 30 MBps on our device.  But the question is what is causing that slow of a rate? 

image.png.f206409bcdd85bf1ddb15ca370680eeb.png

ethtool.txt

Link to comment
Share on other sites

  • 0

Hi Abby,

Ok that's useful (if somewhat surprising) info.

I will try with the ADP3450 I have here over the weekend to see if I can find some way to do 6 MSPS for 3 channels, sustained.

Link to comment
Share on other sites

  • 0

Okay I checked the TCP bandwidth from an ADPro 3450 (in Linux mode) to a fast Linux host. This is hopping over two entry-level Ethernet switches:

image.png.a44635d038ca0f8bbf9dd48a4ebb3695.png

This demonstrates that the ADPro3450 is capable of sustaining about 97 MB/sec of outgoing traffic over a TCP channel, which is not too shabby.

This at least shows that the hardware should be able to transmit the required 36 MB/sec with bandwidth to spare. However, it is still possible that we're running into some other performance issue, e.g. the communication between the ARM and the programmable logic, or some non-optimal user program.

 

Link to comment
Share on other sites

  • 0

Right, I did some network performance testing of Record-mode data acquisition. The results are surprising and, I must say, worrying.

It seems that the network bandwidth used while doing transfers is essentially independent of the requested sample rate (!?). So whether I sample 1, 2, or 3 channels, I see roughly 40 MB/sec flowing out of the ADPro into the PC that runs a DWF program -- and this is true whether I sample at 1 MHz, 1 kHz, or 1 Hz.

Perhaps @attila can give some insight why the ADPro device puts 40 MB/sec onto the network when I request, for example, a single-channel recording at 1 sample per second? That strongly suggests the underlying protocol is extremely inefficient and/or much more processing is done on the client side than I would have ever suspected. For example, if the "status" call just causes a full dump of internal device memory to be transferred from the device to the PC for further processing, that could explain this.

The 40 MB/sec is in "linux" mode. When I put the device in "standard" mode it transmits ~ 70 MB/sec over the network.
 

Edited by reddish
Link to comment
Share on other sites

  • 0
8 hours ago, attila said:

@reddish 

The record without DDRam buffering is not the most efficiently implemented. In this case, in each iteration the entire device buffer 256/512KiB is dumped.

Ok. How can I control DDRam buffering in the API?

Link to comment
Share on other sites

  • 0
On 11/21/2022 at 6:21 PM, AbbyM said:

@reddish  What type of Linux host did you use specifically?  

Glad you were getting the full spec'd rate for 3 channels.  Thanks for running that test!

 

Hi @AbbyM

> What type of Linux host did you use specifically?  

I describe multiple tests in some of my posts; which test are you referring to, specifically?

> Glad you were getting the full spec'd rate for 3 channels.

Hmm, I don't understand. I didn't write that I was getting the fully specced rate for three channels (I don't know what spec that would be, for starters). My tests show that (1) the gigabit Ethernet hardware isn't the bottleneck; and (2) the Digilent firmware does something that is awfully inefficient and weird when using Analog-In Record mode, which means we're not limited by the available transfer speed but by the quality-of-implementation of the Record mode, which is unfortunately bottlenecking the potential performance of what the hardware is capable of by quite some margin.

 

In other words, the hardware is quite capable of doing sustained 6 MS/sec for three or four channels (and it could sustain 10 MS/sec for 4 channels with optimal programming), but unfortunately the device firmware and/or PC-side software isn't capable of tapping that potential. 6 MS/sec for three channels is all you're going to get unless Digilent does some serious re-engineering work on their low-level protocol (which I guess they won't).

 

Link to comment
Share on other sites

  • 0
11 hours ago, attila said:

Hi @reddish

The DDRam buffering is transparent to the user. The same record method can be used as with other devices, as it is in the examples. The option in the application was added because there was a bug with this buffering but it is solved in newer versions.

Ok.

Is my understanding correct that the DDRam buffering is only used in Standard mode, but not in Linux mode?

Cheers, Sidney

Link to comment
Share on other sites

  • 0

I'm using a Raspberry Pi 4 Model B.  With the latest 64-bit version of the OS, Debian 11.

Referring to when you got 70 MBps, and the spec'd rate for Ethernet is 71 MBps.  You can view it in the Waveforms application.

Thanks,

image.png.26ea5fbc11874ef01c3d205349893060.png

Link to comment
Share on other sites

  • 0

Hi @AbbyM


Hmm okay. I don't know what "typical transfer rates and latency" means without further explanation, so I tend to ignore numbers like that. As indicated, just sending TCP from the ADPro I can sustain 97 MB/sec outgoing, which is a lot more. To measure that I used the standard iperf tool, and I transferred to a high-end Linux PC with a beefy Xeon processor. But I am fairly confident that I would measure nearly the same when transferring to a Raspberry 3 or 4, as the bottleneck would be the much slower ARM processor in the ADPro, I think.

The sobering truth is that all these numbers mean very little when the firmware and software implements an inefficient protocol :-/

Link to comment
Share on other sites

  • 0

Hi @reddish

The typical transfer and capture rates are done on my current setup. This, to have comparison between the 10 connection methods.
The transfer rate is not only between the host-app and device firmware-system but also to/from the instrument/PL/FPGA.
For both tests you can find the script in WF SDK/ samples/ py/ Device_Speed.py and AnalogIn_Wps.py

The devices provide multiple instruments via one communication channel so the record/streaming is chopped up and the latency of this reduces the achievable rate.
As I've told earlier, the ADP3X50 in Standard boot mode uses DDRam buffering can capture up 128MiSamples @ 125MHz for 1 channel and offers the best record rate due to large transfer chunks. For the other modes and devices you could use the device configuration with more oscilloscope buffer, like for ADP3X50 the 2nd. With ADP3X50 the capture is done on the enabled 1, 2 or 3/4 channels, when 3 are enabled the capture is done on 4 channels.

Fully DDRam buffered 32MiSamples @ 31.25MHz / 4ch on USB or Ethernet:

image.png

 

200MiS @ 5MHz / 4ch, 5MHz = ~40MiBps/2B/4ch via USB:

image.png

 

The Ethernet is not that reliable for streaming. It may have hiccups, intermittent longer latency which cause buffer overflow.

image.png

Edited by attila
Link to comment
Share on other sites

  • 0

Hi @attila

> The devices provide multiple instruments via one communication channel so the record/streaming is chopped up and the latency of this reduces the achievable rate.

I don't see why latency in itself would reduce bandwidth. Do you mean perhaps that the chopping process requires buffering, and that limited buffering space puts a limit on bandwidth?

> The Ethernet is not that reliable for streaming. It may have hiccups, intermittent longer latency which cause buffer overflow.

Ethernet packet transfer is best-effort by design; but with modern hardware, using a use a direct point-to-point Ethernet link, and properly configured OS-level network buffers on the receiving end, there really should be zero hiccups, and zero packet loss. A modern PC (and also, modern switches) are easily capable of handling 125 MB/sec, even with a regular OS (Windows/Linux/macOS).

If you experience hiccups, it may be worth investigating out if it is possible to increase OS-level network buffers; either by configuring the OS or by configuring the receiving-side socket using some 'setsockopt()' call.

Under normal circumstances Ethernet-based configuration should always beat USB-2 handily in terms of performance, if the Ethernet feeds into a dedicated NIC.

 

I am still curious if DDRam mode only works in Standard boot mode. I would think so, as in Linux mode the DDR would be used by the OS, right?

 

Also, I am wondering about the possible achievable performance over the AXI transport, locally on the ADPro in Linux mode. My expectation would be that this would turn bandwith and latency into a non-issue, and that the performance would be limited by the ARM processor. My experiments unfortunately show disappointing performance in that scenario. Did you test that as well?

 

Cheers, Sidney

Link to comment
Share on other sites

  • 0
6 hours ago, reddish said:

I would think so, as in Linux mode the DDR would be used by the OS, right?

I would think so as well unless Digilent has figured out a way to fit a small  custom Linux OS into the 256 KB PS/PL shared ram.

Even the Z7020 can support DMA from the PL to PS controller DDR at up to 1200 MiB/s rates. A problem is getting samples out of the DDR, that is shared by OS and the AXI DMA. On top of that the only way to feed the PS Ethernet GEM is using DMA supported by fairly complicated software buffer management. I suppose that it's possible to fine tune a ZYNQ system to support high streaming rates without losing data samples. I wouldn't think that this is easy or cheap to accomplish.

A better way would be to just avoid the PS DDR altogether. For instance a better design would be to have the PL connected to it's own Ethernet PHY and DDR memory.

Another option would be to just not have a processor in the loop. The Digilent ZMODS can be used with an Opal Kelly XEM7320 SYZYGY FPGA board. This has no ZYNQ. It does have a USB 3.0 interface that can support streaming at well over 350 MiB/s.  A PCIe interface would be even better. USB 3.0 on an ARM platform likely has lower sustainable data rates. This wouldn't be a packaged solution however as the Digilent instrumentation products provide.

Digilent's high priced versions of the AD2 are interesting but may not provide quite the performance boost of the cheap, but very useful AD2 that users might expect.

I'm not quite sure that IPERF is a good measurement of how well a system can stream large data sets. I suspect that the Raspberry Pi 3 or 4 are not well suited for this kind of application might do better at IPERF than getting Ethernet data into limited user space DDR. Even on a high performance PC there are a lot of OS/software variables that might limit streaming application performance.

One thing that I would suggest for Raspberry Pi 3 or 4 users is not running a graphical interface while trying to do data streaming.

Edited by zygot
Link to comment
Share on other sites

  • 0

Hi @zygot


> I'm not quite sure that IPERF is a good measurement of how well a system can stream large data sets.

I think it's as good a generic test as you can make. Iperf does not do fancy things to enhance performance; if a user program creates a socket, connects to a server, and starts write()ing data, which is the obvious thing to do, bandwidth will be very close to what iperf reports.

The efficiency (ie bandwidth that a user program can use vs the raw available channel bandwidth) is just pretty good for TCP over Ethernet; certainly much better than USB-2. I don't know about USB-3, Thunderbolt, and other more modern protocols -- I get a headache trying to read their specifications so I never felt attracted to dive in. At some point in the 80s or 90s it seems that the idea of writing standards in the most clear and concise way possible was lost, perhaps on purpose -- it may be what happens when you leave standards-writing to industry-heavy committees rather than academia.

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...