Jump to content

How to use and manage memory with my FPGA?


GalD101

Recommended Posts

Hi @GalD101

Lots of questions, good. I'll try to cover all or at least most of them.

>> A bit of an unrelated question: in our case, the good old reg (now logic) will be flipflops only
>> (that is, we will only use ffs for memory (reg)? because I think I once saw that it's also possible to use LUTs instead?)

It seems like you're mixing up a few things.

There are several kinds of memory in your FPGA. First is BlockRAM, we'll get to that later; that's used for "bulk memory".

The second is the flop-flops that are part of the "sea of configurable logic", sitting close to lookup tables (LUTs) that are used to implement combinatorial logic. Currently, we're only using those to store state for our little FSMs.

The LUTs themselves also need to be configured with the data that describes your design. In the end, that configuration is also stored in flip-flops, and those flip-flops can be repurposed as memory that's accessible to your design as well, that's called "distributed RAM".

>> Why did you choose to use an "always_ff" block in line 44 of my_first_fsm.sv

The simple (and hopefully sufficient for you) answer is: because I want to inform Vivado of my intent that the logic described in the block is intended to be translated into flip-flops (memory; state). If Vivado finds anything inside the always block that's not consistent with that, it could generate an error message. I'm not sure to what extent Vivado will do this in practice; I only found out about this feature very recently. The Harris & Harris book recommends it; they seem to know what they are talking about; so for the time being, I follow their advice.

>> Also, why did you use a blocking assignment inside the function?

I want the function to be just that: a (pure) function. A non-blocking function introduces a side-effect; it says: "Schedule an event to update the left-hand-side that should execute once I am done executing the function body".

I didn't try, but I /think/ that non-blocking assignments inside a function are not even allowed.

Essentially, the way I write my code advocates the use of blocking assignments as much as possible, because it is easier to reason about them (for us fickle human beings); they are executed in a very simple and obvious way, one after the other. One of the big plusses of the style I propose is that, within the 'next_state' function as the central place where the module's functionality is defined, you can program pretty much as if you're programming C. Assignments are executed, and after their execution, the left-hand-side values are updated. That's an easy-to-understand model, and it's precisely what non-blocking assignments don't do. There, the effect of an assignment is only visible at some time in the future, not immediately after the execution of the assignment. That adds a lot of semantic complexity that I like to avoid. I have but a small brain.

I think you may be a bit confused by the terminology often used in tutorials and even standards. Saying that statements are executed concurrently is not precisely what happens, at least during simulation. If you run VHDL or (System)Verilog code in a simulator, you will find that those statements are very much executed in a certain deterministic order, prescribed by the standard; rather than speaking about "concurrent assignment", I think it's better to speak about "deferred" or "delayed" assignment.

The way the simulator behaves has not a lot of bearing on what is truly important, which is, what the SystemVerilog compiler will turn your source code into, in terms of hardware resources. The important thing to remember there (for now) is that the function 'next_state' is turned into combinatorial logic, whereas the non-blocking assignment in the always_ff block prompts the compiler to start using flip-flop resources to hold the state in which the calculated next_state is moved, whenever a rising CLK edge happens.

>> I think I didn't understand what "event control" is

That's a term coming from the "discrete event simulator" model of Verilog. You may remember that I wrote that you can run (System)Verilog and HDL code on your PC, and that these simulators are built around a discrete event simulator model. The "event control" terminology that you encounter refers to that.

What is confusing to you (and to me) is the naked fact that these pieces of HDL code can be executed in two entirely different environments: inside a simulator on a PC, or on FPGA hardware. A lot of texts/tutorials mix up the terminology of those two completely different execution models, which is super, super confusing for beginners.

For now, I would ask you to trust me on this: all functionality you will need for your photon coincidence detector can be implemented in about 10 small modules that follow the template of the 'my_first_fsm' module code, with a 'next_state' function that's responsible for expressing the combinatorial logic, and a 'state_type' responsible for declaring whatever internal state the module needs. Everything else is, to some extent, unnecessary confusion. Reading tutorials, often written by people who only half understand what they are doing themselves (the quality of tutorial writers varies tremendously, from complete idiots to super capable people), can cause more confusion than enlightenment, especially given the fact that you're still in the baby-steps phase of using an HDL.

What will help, is at some point to write some super trivial SystemVerilog code, and run Vivado's simulator on it. This de-mystifies a lot of what happens in simulation. However, that is only modestly useful to understand the behavior of your code inside the FPGA, because there, stuff can actually happen truly at the same time. The simulator model of delayed assignment with non-blocking assignments is, to some extent, just a crude approximation of that.

I hope this helps your understanding, but I fear it could only cause more confusion :-/

>> I also find it a bit confusing that there is no longer a distinction between a reg and a wire with the logic keyword.

The problem with the old distinction was that the name 'reg' strongly suggested that the value would end up in a flip-flop, but that wasn't always the case. So effectively, the 'reg' annotation was sometimes (often) a lie; no register would be generated in the hardware, at all. That doesn't help comprehension.

"logic" is a more neutral term, and the compiler will infer by the way you use the variable if it should instantiate a flip-flop for it, or not. If you assign (via a concurrent assignment) to it inside an always block that's sensitive to posedge(CLK), it will say: "ok, this requires a flip-flop". If not, it's just a combinatorial circuit.

>> Another unrelated question that came across my mind (I'm using the clocking wizard again to generate a
>> 100MHz clock). Why can't I choose a value of M to be 25 and D to be 3 instead of 50 and 6 respectively?

Because that would violate two frequency constraints imposed by the hardware. Read back around where I explained the MMCM, specifically, this message:

With D==3, f_pfd would end up at 4 MHz. That is not valid, it must be at least 10 MHz, it says so in the Spartan-7 datasheet.
Also, with D==3 and M==25, f_vco would end up at 100 MHz. That is invalid, because f_vco must be at least 600 MHz, again according to the datasheet; see here).

The choices I made are the obvious way to make 100 MHz. There are a few other solutions that satisfy all constraints, but this one is the easiest.

>> Why didn't you instantiated the fsm with RESET ?

I did? I just tied it to 0 for the time being.

It is generally a good idea to add a synchronous RESET option to all your little finite-state machines. However, right now, I choose not to connect that to, for example, a BUTTON. That will come later.

Edited by reddish
Link to comment
Share on other sites

>> That's something

Indeed, but what?

I have no idea what I'm looking at. Context with scope pictures like these is very much needed, unless there's enough info in the screen to tell. Here, that's not the case.

Edited by reddish
Link to comment
Share on other sites

12 minutes ago, reddish said:

I did? I just tied it to 0 for the time being.

This is what I saw in your file:
my_first_fsm fsm(
        .CLK(CLK),
        .FSM_OUT(PIO_OUT)
    );

shouldn't it be
my_first_fsm fsm(
        .CLK(CLK),
        .RESET(0),
        .FSM_OUT(PIO_OUT)
    );

Link to comment
Share on other sites

1 minute ago, GalD101 said:

This is what I saw in your file:

I think you were looking at the old "classic verilog' file. The SystemVerilog file I attached earlier does have the .RESET(0).

Link to comment
Share on other sites

2 minutes ago, reddish said:

>> That's something

Indeed, but what?

I have no idea what I'm looking at. Some context with the scope pictures is needed.

My mistake, it's similar to the program you wrote but I changed the values of HI and LO to 868 and NUM_COUNTER_BITS to 11
localparam HI = 868;
localparam LO = 868;
localparam NUM_COUNTER_BITS = 11;

Link to comment
Share on other sites

13 minutes ago, GalD101 said:

My mistake

Ok. Protip: when dropping a scope picture, zoom out a bit to see a few periods; perhaps enable the "measure all" function of your scope; and enable the "trigger rate" counter. Those provide all kinds of information that makes it much easier to assess what the picture means.

If you can confirm that you made a 115200 baud (57600 Hz) signal, we're ready to start sending serial data top the PC, which will be a good milestone. Awesome!

Can you confirm this, and tell me what kind of OS you're running on the computer you're using (Windows or Linux)?

Edited by reddish
Link to comment
Share on other sites

image.png.8957383ecace7917123dad52d631c3e3.png
Woot! Success!

Okay I will write up a bit about serial communications next. Give me 30 minutes.

Today, we will get to the point where we send an A character (or many of them!) from the FPGA to your PC. That's the goal.
 

Link to comment
Share on other sites

>> How can I check this? I remember you said the relation between baud and Hz is a factor of 2
 

My mistake. I meant to write 115200 baud a.k.a. 57600 Hz.

Link to comment
Share on other sites

I know you are writing an explanation but I'm curious. When I tried to research about how to transfer data I arrived at the conclusion that using a protocol such as UART (which is asynchronous) won't be good enough for my needs and that I should be using something else such as SPI. Can you please explain to me why would 115200 baud rate would suffice? It's also apparent that this device can reach a faster baud rate from looking at the device manager on windows. picture is attached

baud rate.png

Link to comment
Share on other sites

59 minutes ago, GalD101 said:

My mistake, it's similar to the program you wrote but I changed the values of HI and LO to 868 and NUM_COUNTER_BITS to 11
localparam HI = 868;
localparam LO = 868;
localparam NUM_COUNTER_BITS = 11;

It also works identical with the Verilog (not SystemVerilog) code after I modified these lines
wire [10:0] next_counter = (r_counter + 1) % (11'd1736);
wire       next_fsm_out = (next_counter < 10'd867) ? 1'b1 : 1'b0;

Link to comment
Share on other sites

The device is probably capable of 4 Mbaud, but I need to check the FTDI chip. You will only reach that in Linux though, in my experience, Windows cannot handle beyond 230400 baud reliably. Pretty terrible OS.

First, let's look at using the device as an event tagger.

You could format messages like this:

TTTTTTTTTTTT<space>EE<newline>

Where the 12 T's are a 48-bit hexadecimal timestamp (at 10 ns resolution) and the 2 E's are the status of some pins.

That's 16 characters per event.

Each character will take 9 baud periods, at 7N1, which is pretty efficient. I propose to use a text-based encoding because it will make debugging easier, and framing (== recognition of messages) trivial. Going to a binary message format is possible but won't give you a lot extra events-per-second.

So that's 108 baud periods per event.

At 4 Mbaud, this could handle about 37,000 events per second.

It's not a lot, but not uselessly low. if we could get to a situation where you can do an experiment where the two photon detectors together produce less than 37,000 events per second, we're good to go.

You probably want more though. That means we will go beyond a timetagger, and implement an FPGA-side coincidence detector.

Format could be TTTTTTTTTTTT<space>UUUUUUUUUUUU<newline>, where the T's and U's are timestamps of close-by photons, according to the algorithm I proposed wayyyyy back in the beginning of this thread.

That's 26 characters or 234 baud periods per message, meaning you could handle upward of 17,000 "closely spaced photon detection events" per second. That algorithm will simply discard all photons seen that are not close to photons on the "other" channel. I'd prefer a timetagger, but this could work.

An SPI protocol makes little sense for this. The FPGA would act like a master and push events to some receiver. What would you want to use for the receiver? It would have to be able to handle a lot of messages. And you would connect it to some PIO pin, those things are not very well suited to reliable communication. On the PCB, between the FPGA and the FTDI chip, 4 Mbaud is quite doable. But via PIO pins? I would not like it.

This is the point where you should get a ballpark number for the number of photons and/or coincidences that you expect per second (remember that I asked that wayyyyyyyyyyyyyyyyyyyyy back in the beginning?). Please come up with some numbers for me. If it's 10 million, we can stop. If it's 10,000, we're golden. Anything in between: we'll have to calculate.

Also, for now, read this:

https://en.wikipedia.org/wiki/Serial_communication

Link to comment
Share on other sites

2 minutes ago, GalD101 said:

It also works identical with the Verilog (not SystemVerilog) code after I modified these lines
wire [10:0] next_counter = (r_counter + 1) % (11'd1736);
wire       next_fsm_out = (next_counter < 10'd867) ? 1'b1 : 1'b0;

No, it does not work identically.

You never got to the point where you identified the off-by-one bug that I claim is in the second line. Please stare at that second line intently for a bit, and tell me what's wrong.

Link to comment
Share on other sites

3 minutes ago, reddish said:

No, it does not work identically.

You never got to the point where you identified the off-by-one bug that I claim is in the second line. Please stare at that second line intently for a bit, and tell me what's wrong.

that's weird because I programmed the device with it and it seems to show the same behavior. I probably missed something when compiling. Thanks for letting me know I'll check

Link to comment
Share on other sites

>> that's weird because I programmed the device with it and it seems to show the same behavior

That's because it shows almost the same behavior, but not quite.

Have you stared at that second line intently for a few minutes yet? I'm not kidding, you should do that.
 

Link to comment
Share on other sites

1 hour ago, reddish said:

the 2 E's are the status of some pins.

What will this status represent?

1 hour ago, reddish said:

Each character will take 9 baud periods

Why 9? Is it because each character is 4 bits so you doubled it and added 1 for the parity  stop bit?

1 hour ago, reddish said:

at 7N1

7N1 means 1 start bit, 7 bits of data No parity bit and 1 stop bit?

 

1 hour ago, reddish said:

So that's 108 baud periods per event.

Why 108?

 

1 hour ago, reddish said:

At 4 Mbaud, this could handle about 37,000 events per second.

Because 4M / 108 approximately equals to 37,000?

1 hour ago, reddish said:

That's 26 characters or 234 baud periods per message

Why 234?

1 hour ago, reddish said:

meaning you could handle upward of 17,000

because 4M / 234 approximately equals to 17,000?

 

1 hour ago, reddish said:

This is the point where you should get a ballpark number for the number of photons and/or coincidences that you expect per second

I asked my prof for this, he said he needs to check so I'll update you when I have an answer

Edited by GalD101
Link to comment
Share on other sites

1 hour ago, reddish said:

This is the point where you should get a ballpark number for the number of photons and/or coincidences that you expect per second

"The single photon detector dead time is about 50ns therefore we do not need to measure faster than 20MHz. The system should be ready to count the coincidence rate as high as 20MHz."

Link to comment
Share on other sites

9 minutes ago, GalD101 said:

"The single photon detector dead time is about 50ns therefore we do not need to measure faster than 20MHz. The system should be ready to count the coincidence rate as high as 20MHz."

Note that this is the maximum so my prof would like me to build a system that can handle the max, but it would be OK if it would be a bit less than the max

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...