Memory management Eclypse z7

Riccardo · August 30, 2022

Hello everyone

I'm trying for my project to build up an acquisition system with the Digilent Eclypse z7 board, togheter with the ZMOD ADC1410-105.

I'm currently going through the customization of the baremetal example. What I have to do is to sample a 100s voltage transient.

In particular I would like to sample the first 1ms with the minimum sampling period (10ns), then I'd like to reduce it for the next decades (let's say 100 samples each decade).

At this point I have some questions:

Is there any way I can use to allocate a memory buffer longer than the maximum ADC buffer length?
1. Currently I'm using malloc(), since the ZMOD libraries have a sort of protection on memory related functions, but I'm not able to store more than 400us of sampled datas
2. Is it safe to modify the ZMOD libraries?
Is there any way I can change the sampling frequency from the software? Currently the idea is to store a 200us buffer and mediate it in order to obtain the point desidered for the decade. (not yet implemented on the code since the first part of the program is not working)

Here is the code I'm using and the code I'm refering to:

TRANSFER_LEN = 0x2710

main.cpp

image.png.99cd08188e4cfa65eb41443bebecbd3c.png

image.png.01f5c77deb4a763e63f0e4a8141d9b19.png

Memory function with size protection:

image.png.8ec38cb7ba4d8d87c14393d14fc89cdd.png

ZMOD protected function I can not use

image.png.3f6382c9add2a03dfb149a0a9f06e219.png

Sorry for the long question, and sorry for eventual silly mistakes, it's the first time I'm going through these processes and I'm not fully concentrated on this task.

Thank you for help.

Riccardo

Edited August 30, 2022 by Riccardo

Riccardo · August 30, 2022

Sorry, forgot a part.

image.png.7c9cbf8994122c27dd6165c0714e967e.png

Here there is waht is written by the Eclypse on the serial port. 40000 samples are (if I'm not wrong) 40000*32/8= 160000bytes, which is far from the memory of the Eclypse z7 (isn't is?). But I'm not able to find a solution

I have not tried yet to sort out the problem related to the first message, I wanted to verify that I'm usin the right functions.

Thank you

Riccardo · August 30, 2022

UPDATE:

Even if I try to force the ADC to store the datas in a new different memory location the result is the same: I'm able to sample only 400us.

image.png.87b68936ee9d43176e9183d2f180dfcf.png

image.png.e445dec03be48753e384730112a04204.png

At this point I'm looking for some errors I'm doing in considering the available memory dimensions larger than what it effectively is.

This hypotesis starts from the fact that the ADC buffer is not full

image.png.ea99d7891cb889ebfc08ed9ca32402bf.png

image.png.0fab835393bc00fde1e85b338839761f.png

Edited August 30, 2022 by Riccardo

artvvb · September 3, 2022

Hi @Riccardo,

Unfortunately the currently released demos aren't able to perform acquisitions this large at the max sample rate without significant modifications. Data is buffered in circle buffers in BRAM, which are then streamed through DMA to DDR once they are filled and a trigger is detected. These buffers are by necessity pretty small, in order to not use up all of the BRAM resources on the chip. They were implemented in such a away that theu aren't able to accept new data while they are streaming to DMA. Additionally, the DMA bandwidth between PS and PL is limited by how the AXI DMA controllers were configured and clocked - the AXI_S2MM transfers are 32 bits wide and clocked at the sample rate. This means that the even just the AXI control and response beats in an AXI_S2MM burst are enough to drop the theoretical bandwidth below the max sample rate of the Zmod, let alone the round-trip time of the AXI4-lite transactions used to manage the DMA simple transfers. If you do repeated acquisitions by triggering, receiving data, and rearming the system, it's guaranteed that your data isn't going to be continuous - there will be time gaps.

To answer your questions directly:

You can >technically< allocate memory buffers longer than the ADC buffer and fill them with repeated acquisitions, but samples will not be continuous.
1. When using malloc, you should be aware of the heap size defined in the linker script, to avoid overflows. You can modify this value to significantly increase the amount of memory available to malloc.
2. Yes feel free to modify the Zmod libraries and the hardware platform to fit your needs, however, when doing so, be aware of what the hardware is doing, and check the corresponding IP user guides.
There is no mechanism implemented for changing the sampling frequency from software.

We've been working on a project which might better fit your requirements. It's not fully released yet, so documentation is currently minimal and the code is somewhat messy and might still have bugs, but it might work better than the existing materials for this. It attempts to solve the buffer size limitation by increasing the bandwidth of the AXI_S2MM interface and switching to using the Scatter Gather mode of the AXI DMA to minimize the amount of downtime between successive DMA bursts on a long acquisition. A prerelease for Vivado & Vitis 2021.1 is available here [https://digilent.com/reference/programmable-logic/eclypse-z7/demos/ddr-streaming] - see the "s2mm_cyclic_transfer_test" app. The longest buffer I've tested with is 1 second @ 100 MS/s, though that was some time ago. More recently, a 0x400000 sample-long buffer was successfully acquired. Data transfer rates from the board back to the host are still pretty slow, so for long acquisitions, expect to wait a while for things to come back, and for potentially large amounts of PC memory to be required to store acquired data. Unfortunately, it also does not currently have a method for changing the sample rate without modifying the hardware design.

Thanks,
Arthur

Edit: Either way, the hardware platform will still need modification to add something that can reduce the sample rate. The PL/PS data transfer mechanism in the newer project is likely better suited to this kind of thing since it can take a large number samples into a large DDR buffer.

Edited September 3, 2022 by artvvb

Riccardo · September 7, 2022

Nice! This seems to be a good starting point for my objective.
I tried and I am able to effectively measure more than before (i reached the 1ms at 100Ms/s).

I'm currently stuck due to the impossibility I found in performing two consequent measures (even on the same memory buffer). I need this since 100s at 100Ms/s is too much memory demanding, so we are performing a logaritmic measure, with the first 1ms sampled at the maximum frequency and then with 10 points for each decade, obtained by the mean value of a 200us acquisition. (In addiction the final project will be working with three channels, so with two ZMOD ADC 1410-105).

Is there any HW limitation in doing this?

My code is the following:

Quote

int main () {
   // Initialize device drivers
   InputPipeline Pipe;

   // Initialize IP driver devices
   S2mmInitialize(&(Pipe.S2mm), DMA_ID);
   TriggerControl_Initialize(&(Pipe.Trig), TRIGGER_CTRL_BASEADDR);
   ManualTrigger_Initialize(&(Pipe.Man), MANUAL_TRIGGER_BASEADDR);
   AxiStreamSourceMonitor_Initialize(&(Pipe.TrafficGen), SOURCE_MONITOR_BASEADDR);
   ZmodScope_Initialize(&(Pipe.Scope), SCOPE_BASEADDR);
   UserRegisters_Initialize(&(Pipe.LevelTrigger), LEVELTRIGGER_BASEADDR);

   xil_printf("Done initializing device drivers\r\n");

   // INITIALIZE THE MEMORY
   const buf1msL = 0x186A0;
   //0x186A0 @100Ms/s = 1ms
   // allocate the first 1ms buffer
   u32 Buffer1ms[buf1msL];
   // allocate buffer for the measures o mediate

   if ((Buffer1ms == NULL))
       xil_printf("error 1ms memory");
   else
       xil_printf("ok 1ms\r\n");

   const int_l_buffer = 0x1388;
   const n_samples_decade = 5;
   const n_decades = 5;
   const num_samples_log = n_samples_decade*n_decades;
   //10000 samples @100Ms/s = 100us
   //2 samples more for the final mediate value and its timing
   u32 Buffers[num_samples_log][int_l_buffer];

   if ((Buffers == NULL))
               xil_printf("error 1ms memory\r\n");

   // sets all the memory buffers just located at 0
   memset(Buffer1ms, 0, buf1msL * sizeof(u32));
//   memset(BufferLog, 0, 900*sizeof(u32));
   memset(Buffers, 0, num_samples_log*int_l_buffer*sizeof(u32));

   ZmodScopeRelayConfig CouplingTestRelays = {0, 0, 1, 0};
   ZmodScopeRelayConfig GainTestRelays = {1, 0, 0, 0};
   ZmodScopeRelayConfig HighGainDcCoupling = {1, 1, 1, 1};
   xil_printf("ACQUISITION 100s STARTED\r\n");
   only_meas(Buffer1ms, buf1msL, &Pipe, GainTestRelays, 0b00011, 0x0000, 0x01F0);

   int i;
   for(i=0;i<num_samples_log;i++){
       xil_printf("%d\r\n",i);
       only_meas(Buffers[i], int_l_buffer, &Pipe, GainTestRelays, 0b00011, 0x0000, 0x01F0);
   }
   xil_printf("ACQUISITION 100s ENDED\r\n");
   print_serial(Buffer1ms, int_l_buffer, GainTestRelays, &Pipe);
//   LevelTriggerAcquisition (&Pipe, GainTestRelays, 0b00011, 0x0000, 0x01F0);
//   MinMaxAcquisition(&Pipe, CouplingTestRelays);
//   MinMaxAcquisition(&Pipe, GainTestRelays);

}

Quote

XStatus only_meas(UINTPTR Buffer, u32 BufferLength, InputPipeline *InstPtr, ZmodScopeRelayConfig Relays, u32 TrigEnable, u16 Ch1Level, u16 Ch2Level) {
   xil_printf("entered in function\r\n");
   S2mmTransferHierarchy *S2mmPtr = &(InstPtr->S2mm);
   ManualTrigger *ManPtr = &(InstPtr->Man);
   TriggerControl *TrigPtr = &(InstPtr->Trig);
   AxiStreamSourceMonitor *TrafficGenPtr = &(InstPtr->TrafficGen);
   ZmodScope *ScopePtr = &(InstPtr->Scope);
   UserRegisters *LevelTriggerPtr = &(InstPtr->LevelTrigger);
   const u8 TestMode = 0;
   WriteZmodScopeRelayConfig(ScopePtr, Relays, TestMode);

   // Create a Dma Bd Ring and map the buffer to it
   S2mmAttachBuffer(S2mmPtr, Buffer, BufferLength);

   // Flush the cache before any transfer
   Xil_DCacheFlushRange(Buffer, BufferLength * sizeof(u32));

   const u32 TriggerPosition = 0;//BufferLength / 4;
   // Configure the trigger
   TriggerSetPosition (TrigPtr, BufferLength, TriggerPosition);
   TriggerSetEnable (TrigPtr, TrigEnable);

   u32 Levels = ((u32)(Ch1Level) << 16) | (Ch2Level);
   //   u32 Levels = ((u32)(Ch2Level) << 16) | (Ch1Level);
   UserRegisters_WriteReg(LevelTriggerPtr->BaseAddr, USER_REGISTERS_OUTPUT0_REG_OFFSET, Levels);

   AxiStreamSourceMonitorSetSelect(TrafficGenPtr, SWITCH_SOURCE_SCOPE);

   xil_printf("Initialization done\r\n");

   // Start up the input pipeline from back to front
   // Start the DMA receive
   S2mmStartCyclicTransfer(S2mmPtr);

   // Start the trigger hardware
   TriggerStart(TrigPtr);

   // FIXME: Start the data source first to ensure that the pipeline is flushed into an idle trigger module?

   // Start the Zmod data stream
   ZmodScope_StartStream(ScopePtr);

   // Apply a manual trigger
   sleep(1);
   ManualTriggerIssueTrigger(ManPtr);

   // Wait for trigger hardware to go idle
   xil_printf("Waiting for trigger...\r\n");
   while (!TriggerGetIdle(TrigPtr));

   // FIXME: maybe wait a bit to ensure that the RXEOF frame transfer has completed
   u32 *BufferHeadPtr = S2mmFindStartOfBuffer(S2mmPtr);
   if (BufferHeadPtr == NULL) {
       xil_printf("ERROR: No buffer head detected\r\n");
   }

   u32 BufferHeadIndex = (((u32)BufferHeadPtr - (u32)Buffer) / sizeof(u32)) % BufferLength;

   u32 TriggerDetected = TriggerGetDetected(TrigPtr);

   xil_printf("Buffer base address: %08x\r\n", Buffer);
   xil_printf("Buffer high address: %08x\r\n", ((u32)Buffer) + ((BufferLength-1) * sizeof(u32)));
   xil_printf("Length of buffer (words): %d\r\n", BufferLength);
   xil_printf("Index of buffer head: %d\r\n", BufferHeadIndex);
   xil_printf("Trigger position: %d\r\n", TriggerPosition);
   xil_printf("Index of trigger position: %d\r\n", (BufferHeadIndex + TriggerPosition) % BufferLength);
   xil_printf("Detected trigger condition: %08x\r\n", TriggerDetected);

   // Invalidate the cache to ensure acquired data can be read
   Xil_DCacheInvalidateRange((UINTPTR)Buffer, BufferLength * sizeof(u32));

   xil_printf("Transfer done\r\n");
}

Quote

XStatus print_serial(UINTPTR* Buffer, u32 BufferLength, ZmodScopeRelayConfig Relays, InputPipeline *InstPtr){
   xil_printf("Entered in printing\r\n");
   S2mmTransferHierarchy *S2mmPtr = &(InstPtr->S2mm);
   u32 *BufferHeadPtr = S2mmFindStartOfBuffer(S2mmPtr);
   u32 BufferHeadIndex = (((u32)BufferHeadPtr - (u32)Buffer) / sizeof(u32)) % BufferLength;
   for (u32 i = 0; i < BufferLength; i++) {
       u32 index = (i + BufferHeadIndex) % BufferLength;
       float ch1_mV = 1000.0f * RawDataToVolts(Buffer[index], 0, ZMOD_SCOPE_RESOLUTION, Relays.Ch1Gain);
       float ch2_mV = 1000.0f * RawDataToVolts(Buffer[index], 1, ZMOD_SCOPE_RESOLUTION, Relays.Ch2Gain);
//       const u16 ch1_raw = ChannelData(0, Buffer[index], ZMOD_SCOPE_RESOLUTION);
//       const u16 ch2_raw = ChannelData(1, Buffer[index], ZMOD_SCOPE_RESOLUTION);
       xil_printf("@%08x\t%08x\t%d\t%d\r\n", (u32)Buffer + index*sizeof(u32), Buffer[index], (int)ch1_mV, (int)ch2_mV);
   }
}

And this is the log of the serial port:

I'm sorry for these questions, but I'm a beginner in FPGA. Is there any kind of course/resource I can study in order to understand better how to solve certains problem with FPGAs and how to work with them?

Thank you very much

Edited September 7, 2022 by Riccardo

artvvb · September 8, 2022

Here's one possible option for getting around this:

Successive acquisitions by attaching new buffers is tricky, and the s2mm_transfer implementation needs more work to better support it. In the meantime, the DMA IP can be reset and reinitialized between acquisitions.

When a buffer is "attached" to the S2mm interface, a block of memory is allocated to hold the block descriptors for the DMA, which isn't easily switched out between acquisitions. The blocks described in this memory are "submitted" to hardware, which then uses them to figure out which memory addresses to write to while performing transfers. In order to pull transactions that haven't been completed from hardware, it seems to be necessary to reset the DMA wholesale - I need to look into this some more.

For more reading and documentation on the DMA mechanisms specifically, of what I've seen, there are various blog posts that can be found across the web, which are pretty often narrow for a specific application. Most importantly, there's Xilinx PG021 (https://docs.xilinx.com/r/en-US/pg021_axi_dma), which is focused on the hardware, and comments in driver headers like xaxidma.h. It's unfortunately pretty sparse.

For FPGA in general, there are plenty of decent intro materials, but this particular project is more complicated than you'd learn about in most intro materials, as, realistically, is anything involving a processor. From what I've seen, it's mostly just a matter of spending time getting used to Xilinx's IP ecosystem.

Changes to make to your code to do the reset follows:

Main was somewhat modified so that I could get a quick estimate of the latency between successive transfers. From this provided code, uncommenting the AxiStreamSourceMonitorSetSelect line hands control of the data element of the AXI4-stream going into the DMA over to a hardware counter (running at 125 MHz), which I used to estimate the downtime between two successive transfers (without the use of a sleep function in only_meas) to be about 40 ms - measured from the start of one acquisition to the start of the next. This includes the reset of the DMA core that occurs in the S2mmInitialize call. It's technically longer, depending on when a trigger actually occurs.

Mostly the changes just move S2mmInitialize into only_meas, make sure that an additional reset isn't performed in S2mmCleanup, and make sure that the end index of the buffer is computed before block descriptor memory for the attached buffer is freed in S2mmCleanup.

u32 only_meas(UINTPTR Buffer, u32 BufferLength, InputPipeline *InstPtr, ZmodScopeRelayConfig Relays, u32 TrigEnable, u16 Ch1Level, u16 Ch2Level) {
    xil_printf("entered in function\r\n");
    S2mmTransferHierarchy *S2mmPtr = &(InstPtr->S2mm);
    ManualTrigger *ManPtr = &(InstPtr->Man);
    TriggerControl *TrigPtr = &(InstPtr->Trig);
    AxiStreamSourceMonitor *TrafficGenPtr = &(InstPtr->TrafficGen);
    ZmodScope *ScopePtr = &(InstPtr->Scope);
    UserRegisters *LevelTriggerPtr = &(InstPtr->LevelTrigger);
    const u8 TestMode = 0;

    WriteZmodScopeRelayConfig(ScopePtr, Relays, TestMode);

    S2mmInitialize(S2mmPtr, DMA_ID);
    // Create a Dma Bd Ring and map the buffer to it
    S2mmAttachBuffer(S2mmPtr, Buffer, BufferLength);

    // Flush the cache before any transfer
    Xil_DCacheFlushRange(Buffer, BufferLength * sizeof(u32));


    const u32 TriggerPosition = 0;//BufferLength / 4;
    // Configure the trigger
    TriggerSetPosition (TrigPtr, BufferLength, TriggerPosition);
    TriggerSetEnable (TrigPtr, TrigEnable);

    u32 Levels = ((u32)(Ch1Level) << 16) | (Ch2Level);
    //    u32 Levels = ((u32)(Ch2Level) << 16) | (Ch1Level);
    UserRegisters_WriteReg(LevelTriggerPtr->BaseAddr, USER_REGISTERS_OUTPUT0_REG_OFFSET, Levels);

//    AxiStreamSourceMonitorSetSelect(TrafficGenPtr, SWITCH_SOURCE_GENERATOR);

    xil_printf("Initialization done\r\n");

    // Start up the input pipeline from back to front
    // Start the DMA receive
    S2mmStartCyclicTransfer(S2mmPtr);

    AxiStreamSourceMonitorSetEnable(TrafficGenPtr, 1);

    // Start the trigger hardware
    TriggerStart(TrigPtr);

    // FIXME: Start the data source first to ensure that the pipeline is flushed into an idle trigger module?

    // Start the Zmod data stream; only gets started once
    ZmodScope_StartStream(ScopePtr);

    // Apply a manual trigger
//    sleep(1);
    ManualTriggerIssueTrigger(ManPtr);

    // Wait for trigger hardware to go idle
    xil_printf("Waiting for trigger...\r\n");
    while (!TriggerGetIdle(TrigPtr));

    // FIXME: maybe wait a bit to ensure that the RXEOF frame transfer has completed
    u32 *BufferHeadPtr = S2mmFindStartOfBuffer(S2mmPtr);
    if (BufferHeadPtr == NULL) {
        xil_printf("ERROR: No buffer head detected\r\n");
    }

    u32 BufferHeadIndex = (((u32)BufferHeadPtr - (u32)Buffer) / sizeof(u32)) % BufferLength;

    u32 TriggerDetected = TriggerGetDetected(TrigPtr);

    xil_printf("Buffer base address: %08x\r\n", Buffer);
    xil_printf("Buffer high address: %08x\r\n", ((u32)Buffer) + ((BufferLength-1) * sizeof(u32)));
    xil_printf("Length of buffer (words): %d\r\n", BufferLength);
    xil_printf("Index of buffer head: %d\r\n", BufferHeadIndex);
    xil_printf("Trigger position: %d\r\n", TriggerPosition);
    xil_printf("Index of trigger position: %d\r\n", (BufferHeadIndex + TriggerPosition) % BufferLength);
    xil_printf("Detected trigger condition: %08x\r\n", TriggerDetected);

    // Invalidate the cache to ensure acquired data can be read
    Xil_DCacheInvalidateRange((UINTPTR)Buffer, BufferLength * sizeof(u32));

    S2mmCleanup(S2mmPtr);
    xil_printf("Transfer done\r\n");

    return BufferHeadIndex;
}


XStatus print_serial(UINTPTR* Buffer, u32 BufferLength, u32 BufferHeadIndex, ZmodScopeRelayConfig Relays){
    xil_printf("Entered in printing\r\n");
    for (u32 i = 0; i < BufferLength; i++) {
        u32 index = (i + BufferHeadIndex) % BufferLength;
        float ch1_mV = 1000.0f * RawDataToVolts(Buffer[index], 0, ZMOD_SCOPE_RESOLUTION, Relays.Ch1Gain);
        float ch2_mV = 1000.0f * RawDataToVolts(Buffer[index], 1, ZMOD_SCOPE_RESOLUTION, Relays.Ch2Gain);
//        const u16 ch1_raw = ChannelData(0, Buffer[index], ZMOD_SCOPE_RESOLUTION);
//        const u16 ch2_raw = ChannelData(1, Buffer[index], ZMOD_SCOPE_RESOLUTION);
        xil_printf("@%08x\t%08x\t%d\t%d\r\n", (u32)Buffer + index*sizeof(u32), Buffer[index], (int)ch1_mV, (int)ch2_mV);
    }
    return XST_SUCCESS;
}


int main () {
    // Initialize device drivers
    InputPipeline Pipe;

    // Initialize IP driver devices
//    S2mmInitialize(&(Pipe.S2mm), DMA_ID);
    TriggerControl_Initialize(&(Pipe.Trig), TRIGGER_CTRL_BASEADDR);
    ManualTrigger_Initialize(&(Pipe.Man), MANUAL_TRIGGER_BASEADDR);
    AxiStreamSourceMonitor_Initialize(&(Pipe.TrafficGen), SOURCE_MONITOR_BASEADDR);
    ZmodScope_Initialize(&(Pipe.Scope), SCOPE_BASEADDR);
    UserRegisters_Initialize(&(Pipe.LevelTrigger), LEVELTRIGGER_BASEADDR);

    ZmodScopeRelayConfig GainTestRelays = {1, 0, 0, 0};

    xil_printf("Done initializing device drivers\r\n");

    const u32 buf1_length = 0x1000;
    u32 buf1[buf1_length];
    u32 buf1_head;

    const u32 buf2_length = 0x1000;
    u32 buf2[buf2_length];
    u32 buf2_head;

    memset(buf1, 0, buf1_length * sizeof(u32));
    memset(buf2, 0, buf2_length * sizeof(u32));

    xil_printf("BUF1 ACQUISITION STARTED\r\n");
    buf1_head = only_meas(buf1, buf1_length, &Pipe, GainTestRelays, 0b00011, 0x0000, 0x01F0);
    xil_printf("BUF1 ACQUISITION ENDED\r\n");
    xil_printf("BUF2 ACQUISITION STARTED\r\n");
	buf2_head = only_meas(buf2, buf2_length, &Pipe, GainTestRelays, 0b00011, 0x0000, 0x01F0);
    xil_printf("BUF2 ACQUISITION ENDED\r\n");

    print_serial(buf1, buf1_length, buf1_head, GainTestRelays);
    print_serial(buf2, buf2_length, buf2_head, GainTestRelays);
//    LevelTriggerAcquisition (&Pipe, GainTestRelays, 0b00011, 0x0000, 0x01F0);
//    MinMaxAcquisition(&Pipe, CouplingTestRelays);
//    MinMaxAcquisition(&Pipe, GainTestRelays);

}

In order to reset the DMA before each transfer, and to clean up the block descriptor space afterwards, change the S2mmCleanup function in s2mm_transfer.c as follows.

void S2mmCleanup(S2mmTransferHierarchy *InstPtr) {
//	// Spin down the hardware.
//	XAxiDma_Pause(&(InstPtr->Dma));
//	XAxiDma_Reset(&(InstPtr->Dma));
//	while (!XAxiDma_ResetIsDone(&(InstPtr->Dma)));

	// Clean up Bds and deallocate the BdSpace
	InstPtr->BufferBaseAddr = NULL;
	InstPtr->BufferLength = 0;
	InstPtr->NumBds = 0;
	free((void*)(InstPtr->BdSpace)); // FIXME: throws data abort exception
	InstPtr->BdSpace = NULL;
}

Riccardo · September 27, 2022

Sorry for the long time between your advice and my response.

I tried to reset as you said the code is working better: now I can allocate a lot of memory than before, I think there is still some work to do, in particular I want to delete all the time lost in the ZMOD configurations inside "only_meas" before the actual acquisition, so that I can minimize the time between two subsequent acquisitions.

My question is: do you think I need to change something inside the HW block diagram to perform the measure described in the previous answers or is it only a matter of SW code?

artvvb · October 4, 2022

Hey, sorry for the delay, I've needed to pull some numbers to better answer this, which I still don't fully have yet. From some initial timing of the modified only_meas, it looks like the turnaround time from the start of one transfer to the start of the next transfer, including a DMA reset and reinitialization (which appears to be taking place surprisingly fast), is somewhat less than 50 ms. I should have more info on this later this week.

I'm confident that hardware changes would solve the issue, but will be potentially complicated and specific to your application. A state machine and several counters, integrated into the trigger detector module, could be used to allow some incoming stream beats to pass through to the DMA while the rest are discarded. Essentially, after a start signal, all samples are passed on to DMA until counter rollover at a software-settable limit occurs. Then, the counter is reset and counts up to another limit while samples are discarded. The process then repeats as needed. There are some unused AXI4-lite-connected registers (if I recall correctly, four "outputs" & four "inputs") included in the project that could be used to wire something like this up, but there would still need to be some amount of communication state machine that would need to be designed - depending on the number of "transmit and then block periods" required, it might be necessary to do something like polling a bit that asks the processor to provide a new counter rollover value.

This is of course more complex and specific than finding a better solution in software. For what it's worth, it would also only require a single s2mm transfer be performed which would grab all of the data you intend to pull in, making the software side potentially quite a bit less complex. An immediate trigger (enable and set the manual trigger bit before starting the detector) would allow you to specify a fixed transaction length for the whole thing. More complex trigger behavior (like if each subsequent period that a capture takes place in needs to start at a new trigger event) would require more extensive modifications.

Hopefully that roughly 50 ms start-of-acquisition to start-of-next-acquisition time is sufficient. I'll see if I can provide some better information on that property of the system in the coming days.

Thanks,

Arthur

Riccardo · October 4, 2022

Hello Artur, thank you very mch for what you are doing for me.

In these days I was testing the board with some codes. Now I'm doing some consequent acquisitions with this code

Quote

int main () {

int num_buf = 0x6;
//struct timeval tv1[num_buf], tv2[num_buf];
//clock_t eval_time_t[num_buf][2];
// Initialize device drivers
InputPipeline Pipe;

// Initialize IP driver devices
// S2mmInitialize(&(Pipe.S2mm), DMA_ID);
TriggerControl_Initialize(&(Pipe.Trig), TRIGGER_CTRL_BASEADDR);
ManualTrigger_Initialize(&(Pipe.Man), MANUAL_TRIGGER_BASEADDR);
AxiStreamSourceMonitor_Initialize(&(Pipe.TrafficGen), SOURCE_MONITOR_BASEADDR);
ZmodScope_Initialize(&(Pipe.Scope), SCOPE_BASEADDR);
UserRegisters_Initialize(&(Pipe.LevelTrigger), LEVELTRIGGER_BASEADDR);

ZmodScopeRelayConfig GainTestRelays = {1, 0, 0, 0};

xil_printf("Done initializing device drivers\r\n");

const u32 buf1_length = 0x186A0;
u32 buf1[buf1_length];
u32 buf1_head;

const u32 buf2_length = 0x4e20;
u32 buf2[num_buf][buf2_length];
u32 buf2_head[num_buf];
int i = 0;

memset(buf1, 0, buf1_length * sizeof(u32));
for(i=0;i<num_buf;i++){
   memset(buf2[i], 0, buf2_length * sizeof(u32));
}
xil_printf("BUF1 ACQUISITION STARTED\r\n");
buf1_head = only_meas(buf1, buf1_length, &Pipe, GainTestRelays, 0b00011, 0x0000, 0x01F0);
xil_printf("BUF1 ACQUISITION ENDED\r\n");
xil_printf("BUF2 ACQUISITION STARTED\r\n");
clock_t eval_time_t[2];
eval_time_t[0]= clock();
for(i=0;i<num_buf;i++){
   //xil_printf("BUF2.%d ACQUISITION STARTED\r\n", i);
   buf2_head[i] = only_meas(buf2[i], buf2_length, &Pipe, GainTestRelays, 0b00011, 0x0000, 0x01F0);
   //xil_printf("BUF2.%d ACQUISITION ENDED\r\n", i);
}
eval_time_t[1]= clock();
xil_printf("BUF2 ACQUISITION ENDED\r\n");
//print_serial(buf1, buf1_length, buf1_head, GainTestRelays);
for(i=0;i<num_buf;i++){
   double time = (double) (eval_time_t[1]-eval_time_t[0])/CLOCKS_PER_SEC;
   xil_printf("time = %f\n",time);
}
for(i=0;i<num_buf;i++){
   print_serial(buf2[i], buf2_length, buf2_head[i], GainTestRelays);
}

// LevelTriggerAcquisition (&Pipe, GainTestRelays, 0b00011, 0x0000, 0x01F0);
// MinMaxAcquisition(&Pipe, CouplingTestRelays);
// MinMaxAcquisition(&Pipe, GainTestRelays);

}

The difference in only_meas it is the comment on the lines that were writing on the serial some debug sentences.

What I'm acquiring is descending sawtooth wave with 2Vpp, 10Hz, in order to acquire all the 200us samples in the same period (hoping a delay between two acquisitions<some ms). What i found is in the attached image.

This seems to led to a delay that is changing in time and it is less than 50ms (if it was 50ms delay I would have seen a very different type of read waveform) [around 3ms beteen 0u-200u and 200u-400u; while 1.1ms between 800u-1m and 1m-1.2m]. Is it valuable?

I think that the best option for me would be to override the internal buffer of the ZMOD ADC IP and to manage directly with a custom PL interface for the ADC all the incoming datas, even if it would be very expensive in term of HW programming.

What I need is a delay in time that is stable and around few microseconds.

EDIT:
I found that the DDR3L memory is not accessible from the PL, but it is hard wired to the PS system, is it? In that case is the DMA IP the only possible access to the DDR3L memory?

Edited October 5, 2022 by Riccardo

artvvb · October 5, 2022

Quote

I found that the DDR3L memory is not accessible from the PL, but it is hard wired to the PS system, is it? In that case is the DMA IP the only possible access to the DDR3L memory?

That's accurate. The alternative would be to buffer everything in BRAM, but that's a pretty limited resource. Using all of it, 630 kB, with each sample from each channel taking up 2 bytes, means a single channel buffer length of ~3.20 ms, if I did the math right. You could still discard samples before they go into the memory as I attempted to describe above, so the buffer would be split up over multiple acquisitions.

You likely also want to continue using the low level IP, since it does a lot of the heavy lifting of getting the ADC on the Zmod initialized. If you were to go to a PL-only design, You should still use this. The low pass filter demo might provide some insight into how to use the IP in a PL-only system.

Commenting out the prints is a good call, I missed it. Saves a *bunch* of time in the loop. There might be a similar thing going on with the mallocing and freeing of the block descriptor space for the DMA in the S2mmAttachBuffer and S2mmCleanup functions. The inability to recall pending blocks from the DMA, which is what brings in the requirement to reset it, is really painful.

Here's how I'm using the xtime header to measure the time between acquisitions currently:


    const u32 buf1_length = 0x1000;
    u32 buf1[buf1_length * 50];
    u32 buf1_head;

    memset(buf1, 0, buf1_length * sizeof(u32));

    XTime func_time_clocks;
    XTime time0[50], time1[50];
double func_time_us;
u32 trig_en = 0b00001; // manual only

for (u32 acq = 0; acq < 50; acq++) {
    XTime_GetTime(&(time0[acq]));
    buf1_head = only_meas(buf1 + acq * buf1_length * sizeof(u32), buf1_length, &Pipe, GainTestRelays, trig_en, 0x0000, 0x01F0);
    XTime_GetTime(&(time1[acq]));
}
for (u32 acq = 0; acq < 50; acq++) {
    func_time_clocks = (time1[acq]-time0[acq]);
    func_time_us = 1.0 * (func_time_clocks) / (COUNTS_PER_SECOND / 1000000);
    xil_printf("Acquisition %d:\r\n", acq);
    xil_printf("    %llu clocks\r\n", func_time_clocks * 2);
    xil_printf("    %d.%02d us\r\n", (int)func_time_us, (int)((int)(func_time_us * 100) % 100));
}

Note that I haven't looked at what the data actually looks like after acquisition. With (most) prints removed from the loop, It's taking ~5000 us for the first acquisition, then ~2700 us for subsequent ones on my machine. I'm also currently getting some errors that indicate there's a problem with subsequent acquisitions after a reset, that indicate there might be problems with the data in my implementation.

IMO, the best option would be to make limited hardware changes that allow the DMA to take a single acquisition to let it be continuously run without reset, by inserting a module into the input stream that discards samples that aren't cared about. This is what I previously described. Are triggers necessary for the captures taking place after the first burst, or would you prefer to have them take place at fixed times after the initial event? The latter ought to be easier to implement, as it wouldn't require modification of the trigger module. Either way, doing stuff in PL gives you tight control over when each "sub-acquisition" starts, by using some counters to toggle the tvalid signal in the axi stream after the trigger module high and low as needed.

Thanks,

Arthur

Riccardo · October 6, 2022

All clear.

15 hours ago, artvvb said:

Note that I haven't looked at what the data actually looks like after acquisition. With (most) prints removed from the loop, It's taking ~5000 us for the first acquisition, then ~2700 us for subsequent ones on my machine.

The fact is that this elapsed time between two acquisitions is too high for my application.

The initial idea of using an FPGA acquisition system instead of an oscilloscope is related to the timing: the logaritmic measure I described above comes from the fact that I found limited continuos buffer resource. The real target is to continuosly acquire waveforms, ignoring more samples every decade of time in order to "reduce" the actual sampling frequency (as the image explains), since the more important events are occuring in the first 10ms of the acquisition.

In this way even the buffer of the ZMOD IP is someway limiting, that's why I suggested to create a diy version of it. I feel like the real bottleneck is the interface for data transfer from the ZMOD IP and the AXI-DMA.

15 hours ago, artvvb said:

The alternative would be to buffer everything in BRAM, but that's a pretty limited resource.

Could this be a method to continuos store data and let the AXI DMA IP save them into the DDR in the meanwhile?

15 hours ago, artvvb said:

Are triggers necessary for the captures taking place after the first burst, or would you prefer to have them take place at fixed times after the initial event?

To answer: no, it is not necessary to trigger the acquisitions from the second one, it must be triggered only the first one, but this topic opens to the second critical aspect of this setup: the non perfectly equal time between acquisitions would make the measure not precise enought for my application.

EDIT:

Is the ZMOD IP buffer the FIFO highlighted in the following figure inside "Datapath"?

(image took from ZMOD Scope Controller IP user guide)

Because on the user guide there is written this:

So the next question is (every question that I'm taking is something that I'm studying on, but iI'm happy if you have the answer before me): is the AXI DMA block able to transfer (let's say) instantaneously one single data incoming from the ADC scope controller IP?

Edited October 6, 2022 by Riccardo

artvvb · October 6, 2022

Quote

The fact is that this elapsed time between two acquisitions is too high for my application.

[...] the non perfectly equal time between acquisitions would make the measure not precise enought for my application.

Understood, makes sense. Hardware changes are necessary in that case.

The DMA implementation in the streaming project is capable of handling the full sample rate coming out of the low level IP, and can also handle arbitrarily slower sample rates, but is not able to reduce the sample rate on its own. The DMA IP has internal buffers which are used to store up data that is then sent to DDR in a burst when either that buffer is filled, or when a piece of data accompanied by a "last" signal indicating the end of the stream is sent. Data does not need to be sent to the DMA continuously. If data doesn't arrive, then the burst is just not sent until enough data has been received.

Quote

is the AXI DMA block able to transfer (let's say) instantaneously one single data incoming from the ADC scope controller IP?

The above might somewhat answer this question. One sample of data can be sent to the DMA at any time. However, it isn't sent to DDR until enough data has been pushed in behind it (or it is indicated to be the last in the stream).

The Zmod Scope Controller is designed to provide data at a fixed sample rate, and the buffer inside of it is only intended to move data between the two clock domains. Reducing the sample rate requires additional hardware, sitting in the stream somewhere between the DMA and the Scope controller, to do the downsampling/decimation of the data.

Hope this helps,

Arthur

Riccardo · November 4, 2022

Hello artvvb.
I'm trying to change the Hardware of the DDR streaming application. Every time I try to open the project this window appears:

image.png.e5badb0161a2b582543d53d03f016381.png

The problem remains even during the implementation. Is there somethin I did wrong? Currently Im using vivado2021.1. I tried to look for some solutions but it seemed that the only way is to reinstall vivado.

In my project I removed the possibility to perform externally the calibration and the relays control, so now the interface for the ADC is the following (with all the control for trigger and son on..), so I'm not even using those ports.

image.png.c020f08c1ce7713045c43038c6f8dee0.png

Thank you for your time

Riccardo

artvvb · November 4, 2022

You aren't doing anything wrong. The project works with those critical warnings even when the external calibration ports are enabled. I'll see if I can track it down regardless, there's an interface definition in the vivado-library IP repo that should be getting used by both the AWG and Scope IPs to describe those ports. The AWG IP uses the same interface, so it could be that it's what is causing the errors to appear currently.

Thanks,

Arthur

bpwilliams · April 10

On 10/5/2022 at 6:29 PM, artvvb said:

That's accurate. The alternative would be to buffer everything in BRAM, but that's a pretty limited resource. Using all of it, 630 kB, with each sample from each channel taking up 2 bytes, means a single channel buffer length of ~3.20 ms, if I did the math right. You could still discard samples before they go into the memory as I attempted to describe above, so the buffer would be split up over multiple acquisitions.

I'm trying to just increase the BRAM size to take ~3.2 ms of continuous data.

So far, as a test I have increased the kBufferSize from 14 to 15 which should give me ~320 us of data.

I changed kBufferSize in several modules :
'AXI_zmodADC1410_v1_0.vhd' (default and customization options)
'Circular_Buffer.vhd'
'Circular_Buffer--Block Memory Generator' (16384-->32768)

Also had to change
In Circular_Buffer.vhd line 94, 96 change
line 94: addra : IN STD_LOGIC_VECTOR(13 DOWNTO 0); -->addra : IN STD_LOGIC_VECTOR(kBufferSize-1 DOWNTO 0);
line 96: addrb : IN STD_LOGIC_VECTOR(13 DOWNTO 0); -->addrb : IN STD_LOGIC_VECTOR(kBufferSize-1 DOWNTO 0);

On the software side:
Changed 'zmodadc1410.h'

line

#define ZMODADC1410_MAX_BUFFER_LEN 0x3FFF --> #define ZMODADC1410_MAX_BUFFER_LEN 0x7FFF

'zmodadc1410.cpp' (perhaps a mistake here?)

    uint16_t ZMODADC1410::channelData(uint8_t channel, uint32_t data)
    {
    //return (channel ? (data >> 2) : (data >> 18)) & 0x00003FFF;
    return (channel ? (data >> 2) : (data >> 18)) & 0x00007FFF;

}

Finally I changed the acquisition length to 32768;

I am still getting the first 160 us of data but just zero values for the rest.

Any idea what I missed? Are there any changes necessary to the DMA structure?

Thanks!

artvvb · April 10

2 hours ago, bpwilliams said:

'zmodadc1410.cpp' (perhaps a mistake here?)

    uint16_t ZMODADC1410::channelData(uint8_t channel, uint32_t data)
    {
    //return (channel ? (data >> 2) : (data >> 18)) & 0x00003FFF;
    return (channel ? (data >> 2) : (data >> 18)) & 0x00007FFF;

    }

The mask here represents pulling out one of two 14-bit values packed into a 32-bit word, so it should stay 3FFF.

2 hours ago, bpwilliams said:

Any idea what I missed? Are there any changes necessary to the DMA structure?

For the DMA, please confirm the width of buffer length register setting is large enough, it's an IP setting in the block design.

I'm not sure where else to look, it looks like the window position register width in the AXI controller is also controlled by kBufferSize in the VHDL source, and the corresponding definition in the C++ sources defines it as up to 26 bits wide: https://github.com/Digilent/zmodlib/blob/f2f491971aa43fa23d3d2a1d6640d6f97ad69318/ZmodADC1410/zmodadc1410.h#L49C9-L49C40. Same for the S2MM length register width in software: https://github.com/Digilent/zmodlib/blob/f2f491971aa43fa23d3d2a1d6640d6f97ad69318/Zmod/zmod.h#L46C9-L46C45.

I would also consider adding an ILA to the design (if possible, it also requires BRAM...), to take a look at the AXI stream signals going into the DMA.

Thanks,

Arthur

bpwilliams · April 11

3 hours ago, artvvb said:

For the DMA, please confirm the width of buffer length register setting is large enough, it's an IP setting in the block design.

Awesome, this was the trick! I also fixed the mask error I introduced. The DMA buffer length register was set to 16 I changed it to 17.
I can now collect 320 us.

I'll let you know what the maximum capture time ends up being. Thanks so much! ⭐ ⭐ ⭐ ⭐

bpwilliams · April 11

I was able to get 1.3 ms of data with a block-ram depth of 131072 (per channel). This corresponds to kBufferSize =17 and DMA Buffer Length Register =19.

When I tried a design to get 2.6 ms, it gave me an error that I was out of block-RAM.

Thanks again!

Brian

artvvb · April 11

There's a utilization report that tells you where BRAM is being used, Implementation -> Open Implemented Design -> Report Utilization, in the Flow Navigator. Cutting the AWG output hardware from the design would get you some BRAM back that could be used in the Scope inputs. There might be some other places where BRAMs are being used, like in FIFOs for clock domain crossings. If you need substantially more record length, you'll need to switch to a different architecture that doesn't rely on BRAMs or downsample in hardware.

This is an example utilization report from another project, 15 out of 16 BRAMs are being used by an ILA, out of the total 140 available:

Memory management Eclypse z7

Question

Link to comment

Share on other sites

18 answers to this question

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in