Jump to content

D@n

Members
  • Posts

    2,250
  • Joined

  • Last visited

Posts posted by D@n

  1. @engrpetero,

    Yes, that is me.   I am the author of the ZipCPU blog, and twitter feed.  To prove it, let me predict the next article will be about formally verifying an SD card block data receiver.  I might even add some hardware lessons learned--since I now have the device (somewhat) running in hardware.  (Yes, the article is mostly written ...)

    But back to your struggles ... device lockup can be a challenge to debug.  It's usually caused by a bus slave (or master) that doesn't obey the rules of the bus.  I'm not sure if you are using AXI or AXI-Lite, as there are more rules for AXI than AXI-Lite.  AXI-Lite is usually the easiest to work with.  Lockup with AXI-Lite is typically caused by a request that doesn't get any acknowledgment.  Classic examples would be 1) the number of (BVALID && BREADY)s doesn't match the number of (AWVALID && AWREADY)s or the number of (WVALID && WREADY)s, or likewise 2) if the number of (RVALID && RREADY)s doesn't match the number of (ARVALID && ARREADY)s.  Bus lockups in AXI-Lite can also take place if you expect AWVALID && WVALID to arrive on the same clock cycle, or if you wait for READY before asserting VALID--such as waiting on BREADY before asserting BVALID.  Another problem I've seen recently had to do with someone getting the address to their peripheral wrong, so the design then locked up when they tried to access a non-existent device.

    The way to deal with this problem is (easiest to hardest):  1) to formally verify anything before it touches hardware (you knew I would say that), 2) to use an internal logic analyzer to watch these signals, or 3) if all else fails assign some LEDs to the task.  Key signals to look for would be VALIDs stuck without READY, or counters of the above signals counting requests vs responses and then key an LED to the counters not matching.  Another useful LED might be a toggle to just tell you if your IP was accessed at all from the ARM.

    Others on this forum will tell you the easiest way to deal with this is not to use AXI at all--but it is kind of hard to avoid with the basic Zynq type of platforms.  (Just giving you a heads up, lest this conversation get off track ...)

    Dan

  2. 53 minutes ago, zygot said:

    I'm not referring to resource hungry programmable controllers like RISC-V either.

    RISC-V cores aren't necessarily resource hungry.  I've seen quite a few that've given me a run for my money when trying to generate low-resource IP.  These can actually get really small--once you throw away all of the expensive features.  In fact, I know of one that will work using only 125 LUTs.  All that's to say, it's not necessarily the RV IP that is the resource consumer.

    Quote

    I wouldn't get out the back-hoe to weed my garden when a simple hand-held tool would do a better job and be a lot quicker. If you can design and debug a Verilog module, then do you really need 1 or more processors to get your project completed? Do you really want to spend your time on multiple software development projects using proprietary tools, plus complicated Verilog design only to find that your system concept is too hard to understand and get working?

    Yes/no.  While I understand the sentiment, a lot of software engineers trying to access hardware will initially view this to be the "easiest" approach.  A CPU can (presumably) convert a hardware design problem to a software problem, and hence it is often estimated to be a simpler task than properly learning how to engineer hardware designs in the first place.  That said, a CPU needs 1) a ROM to get its first instructions from (typically provided by flash), 2) some RAM for its stack, 3) the special purpose verilog module this engineer wants, 4) a serial port for debugging purposes, 5) an interconnect to connect all three together, 6) a linker script, 7) software to load the program first into ROM, and then 8) software to load the program from ROM to RAM, and ... @zygot has a point here.  All of that put together constitutes both 1) a lot of complexity that will need to be debugged when things aren't working (and no, nothing works the first time), and 2) it also represents a significant logic cost which will pull you away from your ability to use the full resources of the part you've purchased.

    Dan

  3. If I recall correctly, MicroBlaze does not work well with other MicroBlazes.  The specification only warrants one in a system.  This was at least the answer given to me when I pointed out that Xilinx's MIG controller couldn't handle exclusive access.

    Backing up a bit.  To do multiple CPUs in a design (properly), you need some form of communicating between them.  This is typically done via atomic accesses over the bus.  AXI calls this capability "Exclusive access", and there's quite the protocol behind it to do it.  When I was studying how to implement exclusive access on my own, I discovered the MIG did not support it.  I think Microblaze had instruction support for it, but the MIG did not.  When digging I seem to recall someone saying that the documentation specifically disallowed Microblaze from ever working in a multi-CPU system, so this would never be an issue.

    In other words--you are doing something new, and untested.

    Dan

  4. Quote

    As for using a 1 GbE PHY as a modem to connect FPGA boards together, or even an FPGA board and a PC together for data transport, this is an almost ideal approach ( at least from a hardware perspective ), as long as you eschew FPGA tool vendor IP and use the all HDL approach. I've been using both UARTs and Ethernet extensively for such purposed for years.

    I'm still new to this party as of this past year, and ... No, I am not using any FPGA tool vendor IP beyond the required IO hardware macro blocks.  (IDDR, OSERDES, etc).  Yes, it took a bit more work to do--especially since I handled almost all of the network via RTL.  Still, I find your quote, "... this is almost ideal approach", encouraging.  Thank you.

    Dan

  5. @Morocco_Brittany56,

    Quote

    While working on a project, a arrived at a point where I need some Ram or Fifo on which I can write using microblaze (AXI) and getting data from it using a Native port (In my case I will write UART code to read data from Fifo).

    @zygot has a point.  Why would you use Vivado generated IP for this?  Their generated designs are typically broken.  Not only that, this is easy enough to do.

    Let's start at the top.  You want some kind of "Ram or Fifo".  Building a RAM or a FIFO is a first year beginner project.  It's not something that should require Vivado IP.  You should have a "Ram or Fifo" in your own back pocket that you've tested, verified and trust.  Here is one of mine, for example.  Frankly, they're a lot easier to work with in RTL than when using Vivado IP.  Among other things, you can debug your own.  You can't very well debug Vivado's IPs.  (I know ... I submitted some bug requests back in 2018, and they still haven't been fixed the last I checked ...)

    Quote

    Does this type of IP Exist? or did someone developped it before?

    Absolutely!  I even have my own AXI-Lite UART and it has an integrated FIFO within it.  I built it years ago, and I would encourage you to continue doing the same.

    In my case, though, I merged the FIFO into the UART.  That way the CPU has a consistent interface.  It can read/write the UART registers, or the UART FIFO, and they're both in (roughly) the same spot.  I mean, you could keep the two separate if you wanted to, but why would  you?  The UART software driver would then need to interact with two IPs (the UART and the FIFO) just to send anything.  It just makes more sense to give them both the same register map.

    As for the comments about a 13MBaud UART ... I'm all ears.  Sometimes I can get mine to operate at/near 4MBaud, but 1-2MBaud is more common with the FTDI chips I find in most FPGAs.  (On the other hand, on one project I replaced the UART with a Gb Ethernet IP and was just shocked at how fast the interface went in comparison ...)

    Dan

     

  6. @hlittle, Thanks for pointing out the difference!  I now see what you are talking about.  The "write-protect" pin from the PModSD is now an NC on the PMod Micro, but the schematic shows that a circuit might be populated ... got it.  I'll look forward to hearing the answer as well.

    Dan

  7. From the reference alone, there are no pins left to turn on or off the power.  You have the pins you need to control the SD card, but not to power it up or down save by unplugging it.  You might wish to look into sending a reset command to the device, but controlling power is off the table.

    Dan

  8. That particular software was rebuilt for a Pi years ago.  As I recall, the rebuild didn't require changes--it just needed to be recompiled with the ARM compiler.  The software works in a Linux system environment, and the network and UART interfaces are all well defined there and have been for years.

    As for senior projects, there's quite the challenge on the instructor's side that I've seen over the years, and that is estimating the complexity of the project.  Often instructor's have no real clue what the students are up to and so wildly over or under estimate the student work load.  I suppose it just goes with the territory.  It's just something I've often seen, and so something I'm sympathetic to when reading of these things.  In this case, the project can go from nearly impossible to trivial depending upon details the student hasn't (yet) shared with us.

  9. If this is a software project, it's pretty easy.  I wrote this in an afternoon.  It does almost exactly what you are looking for (though it splits data across two TCP/IP network streams), and it'd be fairly easy to adjust back to a single stream again if you wished to.  (You might even be able to find the single-stream version in the git history, for exactly what you've described.)

    If this is an RTL project, then yeah, it'd be a bit more ambitious.  The key question here, though, is where is the hardware/software boundary?

    Dan

  10. @artvvb,

    I would also ask Digilent to post and maintain PDF copies of all of their manuals.

    It is common for hardware to outlive its support, or for newer versions of hardware to have newer documents that then get confused with the manuals for older products.  My solution to this has always been to download the PDF manual of the user guides or other spec sheets at the time of purchase, to guarantee that 1) I won't get confused by an update to a product I don't have later, and 2) I'll still be able to keep and maintain the manual long after the company that built the product has stopped supporting it.

    Thanks!

    Dan

  11. There should be no combinatorial loops within the design.  (If there were, I'd fix them ...)

    Can you post the generated design you are using somewhere?  And (even better) can you list/find all the components of this loop?  One thing that doesn't make sense from your comment above is that the *genmpy_o_r_reg[25]_O bit is an output of a flip-flop.  I wouldn't expect that in a combinatorial loop, but maybe it starts the loop?  Again, that'd be why I'd want to see the logic you've generated if you could post it somewhere for evaluation.

    Thanks!

    Dan

  12. @kalainan,

    Are you sure you know what you are asking for?  Phase is one of the most meaningless and irrelevant outputs of an FFT.  Why?  Well, because phase has to be referenced to something.  Do you really want to reference phase to a local clock which can not have any calibrations of any type?

    Likewise, if you are new to an FFT, then I think you will find the bit-reversed order difficult to work with.  If you weren't using a bit-reversed order, I could tell you what bins to expect an output in.  With a bit reversed order things get a bit more difficult, and I'd have to think about it.

    Digging deeper, I'd never use an FFT without some amount of overlap and window function.  Unfortunately, any amount of overlap will really mess with the phase you are trying to measure.  So ... you are going to need some knowledge about what you are doing.  The FFT alone is rarely sufficient for such purposes.  This, however, is a much longer discussion to be had, and that with your DSP instructor.

    Dan

  13. > ... the devil is in the details

    It's hard not to snicker at this comment in a rather paternal manner.  The comment is just too true.  The snickering comes from remembering all of the times in the last couple of decades when I've ended up learning this principle all too well.

    Yes, I made a living for years skating on the edge of what was physically possible.  It was a lot of fun.  Yes, you can often use a wrench like you would a hammer.  But as @zygot said, the devil is in the details.  You may need to get to know those details well.

    In your case, yes, it is theoretically possible to make an antenna to capture anything in the range of an FPGA's input capture capabilities.  Yes, a 1-bit FPGA input can act as an ADC.  Yes, such a 1-bit ADC may be sufficient for many purposes.  I remember presenting a proof to the boss that a 1-bit signal would be sufficient for GPS processing.  (I don't know about Zigbee ... I didn't do that analysis.)

    Just to add to the discussion, though, let me ask: have you given any thought to the type of pre-amp you might use?  Will you need a band pass filter?  If not, then let me ask you what would happen if you don't use a proper pre-amp/filter and the signal you are interested in just happens to be dwarfed by something else nearby?  Would you be able to receive the Zigbee signal of interest?  Or, of something went wrong, would you have enough lab equipment (and of the right type) to figure out what went wrong?  These could easily become critical details you may wish to consider early in your project.

    Dan

  14. Build a design that then allows you to program the flash?  Here's an example project where I did just that. 

    • The project actually contains two top levels that I would choose between.  There's the alternate top level, which I used to program the flash and examine I/Os, and the normal top level which I used for my ultimate design.
    • Once both projects were built, I would load the first with JTAG onto the board.
    • Once the alternate top level was loaded onto the board, I could then load a bit file into flash.  This also loaded the software I needed for my homegrown CPU, the ZipCPU.
    • If I wanted, I could then load the normal top level bit file via JTAG as well.  This was useful if the project design changed, but the CPU instructions did not.

    The flash controller had two parts to it.  One part allowed me to read from the flash, whereas a second interface allowed me to write arbitrary commands to the flash.  You can read more about the design approach here if you would like.

    A serial port interface allowed me access to read and write values on the bus from external to the FPGA.  This meant that I could write a small piece of flash control software for that purpose.  The actual program that drove this controller was one I called zipload.  This not only loaded the bitfile provided to it, but would also decompose and read an ELF file in order to place the design at the right flash address.

    One of the things I learned along the way was that it was easy to get the flash out of sync with the hardware controller.  This meant that the controller required a way to return the flash to its basic SPI based configuration no matter what mode it was in.  If the controller ever gets out of sync, you can usually recover it by powering down the board and starting both up again.

    I also learned that there are special write protect registers within the flash.  Some of these can be set only once and never cleared.  You should be able to look up the data sheet for your flash and then read these registers.  If the write protect is truly set internally, then you may need a new flash chip.  (Might be easier to get a new board.)  Read the registers, check the data sheet, and adjust as appropriate.

    Just my two cents.

    Dan

  15. @HomaGOD,

    Let's see ... when I built with the keypad, I used the COLumns as FPGA outputs, and the ROWs and FPGA inputs.  Here's how I went about reading it:

    1. First, output zeros on all of the COLumns.  This is the normal state of the keypad.  You'll sit here until something happens.
    2. If any of the ROWs, treated as inputs, produce a value other than VCC, then a button has been pressed.
    3. You can then output VCC on two of the COLumns.  If the ROW inputs don't change, then these two columns were not responsible for the button--repeat with the other two outputs
    4. Once you've narrowed down which two of the COLumns is responsible for the button press, you can set VCC out on three of the columns--the two unused (i.e. unpressed) ones, and the one that has been pressed.
    5. Your goal is to find the one COLumn, which when set to zero, leaves the ROW at zero--because the key is pressed.  That column plus the row then gives you the key you need.

    Beware of bouncing!  Once the key is pressed, you will have to wait for it to settle before reading it.  Looking back over my notes, I waited for 100,000 ticks after registering the ROWs weren't all VCC before I went and tried to figure out which COLumn was responsible.

    Also, this isn't the only way to handle the PMod keypad.  I remember doing this in college (decades ago ...) and reversing the directions of the pins in the process.  In this case, the pull ups can help you to know which way to go.  For example, the COLumns have no pull ups on them--so they work better as outputs than as inputs.

    Dan

  16. @tnkumar,

    I think if I were going to learn AXI, (once I'd gotten past the reference guides), I'd start with learning about skidbuffers.  I know of no other way to meet AXI's requirements, and maintain 100% throughput, than to use some skidbuffers.  (Often times you can cheat, and skip the skidbuffer, but the result--while it might work--isn't AXI compliant.)

    Once you know what a skidbuffer is and how to use it, I'd then move on to AXI lite.  Here's my favorite, documented, AXI-lite example.  I use it for everything as my starting point.  (Don't use Xilinx's example design--it's broken.  It'll work for a while, enough to give you the confidence to believe it works, and then it'll suddenly fail on you when you are not expecting it.  This'll leave you looking all over your design for the bug and ... in all the wrong places.)

    If you really want to learn AXI in its full glory, then you'll need to understand AXI addressing.  It's ... not simple.  I have a routine I've highly optimized for this purpose that really helps me unwrap bursts.  Once you can unwrap burst addressing, all that's left is to count beats in a burst, and make sure that the AXI ID returned matches the AXI ID requested.

    If after all of this you still want more, then consider this discussion on converting an AXI-lite bus master to an AXI bus master.  The big difference between the two, in this application, is the exclusive access.  Beware, however ... Xilinx's IP doesn't support exclusive access.  Microblaze doesn't use it.  The MIG doesn't support it.  etc.  If you need something that'll use exclusive access, then you might need an example slave to work from that supports it.  For most users, though, exclusive access isn't a requirement and the capability can be safely ignored.

    Dan

  17. I suppose you could use the GPIO outputs, but the more general solution would be to build an AXI-lite peripheral.

    Don't use the Vivado generated AXI-lite example.  It's broken.  You can have Vivado build the example for you, but if you do that then rip the guts out of it.  Replace it with something looking like this, and then you'll have a nicely working AXI-lite peripheral that you can do a lot of things with.

    Once you have the basic AXI-lite slave up and running, you'll want to modify it with something like:

    // Write to one of two registers
    
    always @(posedge S_AXI_ACLK)
    if (axil_write_ready)
    begin
      case(skd_awaddr) // Assuming you left this as byte addressing ...
      0: reg_A <= skd_wdata;
      4: reg_B <= skd_wdata;
      endcase
    end
    
    // Read from either register, or their sum
    always @(posedge S_AXI_ACLK)
    if (axil_read_ready)
    begin
      axil_read_data <= 0;
      case(skd_araddr)
      0: axil_read_data <= reg_A;
      4: axil_read_data <= reg_B;
      8: axil_read_data <= reg_A + reg_B;
      endcase
    end

    ... or, at least, something like that.  It's been a while since I've looked at the specific register names to know that I got them right.  Bottom line, though, is that any good bus slave will have some kind of interaction roughly like that above, and a good AXI-lite slave is not really any different--once you get past the challenge of decoding AXI-lite in the first place.

    Dan

  18. ISE will use the beginning of your flash for your bit file.  Once the bit file has been copied to the flash, the rest of the flash is yours to do with as you please.  Well, to be a touch more precise, it will use the beginning of your flash for your bin file--a bin file is a bit file minus a small (36byte?) header, so they're almost the same thing.  The flash size on the S6 is 16MB, and of the two bit files I have lying around, one is 334kB and the other is 273kB.  You should therefore have plenty of room left over.  I'd be tempted to grab the last 15MB, but you could probably even grab another half MB or more.

    I do have a Verilog example that uses flash memory on the S6.  It was a test of an earlier version of the ZipCPU--a CPU designed to work in an area efficient environment.  The design places what I call a "multi-tasking OS" onto the S6.  It then runs a series of  (mostly) independent programs, time slicing between them.  Due to the lack of RAM on the S6, the majority of the software is placed into flash.  For performance reasons, a small piece of code was placed into block RAM--mostly for interrupt handling and such.  Further, due to the lack of area, the design that loaded the flash in the first place was separate from the design having the CPU within it.  (Both are in the same repository.)

    Dan

  19. @mdarmanu,

    How "'real time" do you want your measurement to be?  If you measure clock cycles on a 100MHz clock, the result time 10ns should equal wall time.  In m estimation, the clocks Digilent uses tend to be accurate to within about 1-20ppm or so.

    If this isn't "good enough" for you, you'll need to start with a better oscillator.  The GPS PMod can help, but ... it's not perfect.  It will only produce a pulse to lock on to once per second.  For some problems, running the algorithm many thousands of times over and then measuring seconds in this manner may be good enough--but it is a lot harder than the first method.

    Bottom line: "It depends" on your requirement, and how accurate you need things to be.

    Dan

×
×
  • Create New...