Vivado slowness reality check

gwideman · February 23, 2018

Hi folks.

Could someone give me a reality check on how fast the Vivado tool chain is expected to run? I mean the entire Synthesis, Implementation, Generate Bitstream sequence.

Here's what I get on a simple "hello world" design involving a single top.v with a simple 40-bit counter, sending 8 outputs to pins, no additional IP, targeting a CMod A7-35. I am using pretty much out-of-the-box settings in Vivado, so far as I am aware.

If I make a minor change to top.v, then click on "Generate Bitstream", this invokes the entire chain (no surprise), which takes a total of over 3 minutes to run, using 100% of one CPU, and sometimes up to an additional 50% of another.

Windows 7, on AMD Phenom 6 core CPU with 8G or RAM. There does not seem to be undue memory paging and not all that much disk activity in general, so this seems to be CPU bound.

Is this normal? Or am I unwittingly asking it to go particularly slow somehow?

Graham

xc6lx45 · February 25, 2018

For comparison: My labToy project on CMOD A7 35 builds in 3:40 min (excluding clock IP, measured on my wristwatch by resetting synthesis, then "generate bitstream").
It's not a large project - about 20 % of DSP used and slices touched - but not trivial either.

A hello-world project compiles in maybe 1 min, give or take some. But my desktop was built for the job (water-cooled i7 4930 @ 4.5G, 32G quad-channel RAM, M2 SSD).

Most of this doesn't help with a one-LED design, but there are a number of things that will slow down the run considerably:

- Use correct timing constraints:

On 2/23/2018 at 2:30 PM, jamey.hicks said:

...Build times increase with design size, but they increase faster if the toolchain has to work hard to try to meet timing constraints...

For example, a LED driven from logic clocked at 200 MHz can be very difficult to route (but at the 12 MHz crystal frequency it shouldn't matter much).
A simple

set_false_path -to [get_ports LED]

makes it "don't-care".

- Throw in extra registers where appropriate, especially between blocks (which tend to be physically separate). Most of the time, it does not matter whether the signal arrives one or two clock cycles late, and some spare registers will simplify implementation. This is especially useful for register rebalancing.

- For the extra registers, it may make sense to use a "don't touch" attribute. E.g. in Verilog:

   (* DONT_TOUCH = "TRUE" *)reg [5:0] 			wa [1:NWRDELAY];
   (* DONT_TOUCH = "TRUE" *)reg [17:0] 			wd [1:NWRDELAY];
   (* DONT_TOUCH = "TRUE" *)reg 			we [1:NWRDELAY];

When I have multiple, parallel instances of a timing-critical block, the input registers are logically equivalent, get optimized away, and then P&R takes ages because timing is so difficult. The "don't touch" attribute" keeps them separate, possibly using a couple of FFs more than strictly necessary.

- Removal of redundant logic can take a long time. For example, when I simulate pipelined DSP like the "labToy" generators I simply carry all data all the way through the pipeline, even though most of it isn't needed. Optimization will eventually remove it, but the cost is runtime. The LabToy example includes 8 instances each with a 6-lane 14-cycle 18-bit wide pipeline, and it adds minutes to the synthesis time if I don't remove the unused ends of delay chains in the source code.

- Read and understand every warning, and read the timing report. "The compiler is my friend"
For example, with PLL blocks it is easy to create duplicate clocks with the same frequency (one from the constraints file, one from the IP block). Timing analysis tries to (and will eventually) sort out all possible interactions, but it takes a lot of time and can create meaningless but difficult routing constraints.

- Fix "critical warnings" related to timing. Even if common sense tells the design will work e.g. classroom demo with buttons, Vivado will waste a lot of time trying the impossible.

jpeyron · February 23, 2018

Hi @gwideman,

It depends on the number of cpu's you choose as well as the ram size/type your pc has. It also greatly depends on the board and project size you are generating a bitstream for. I have had this process take anywhere from 10 - 45 minutes. I have and I7, with 16 GB of ram on windows 7.

cheers,

Jon

gwideman · February 23, 2018

I chose 6 cpus, but it only appears to use 1 to 2 as I mentioned, and 8G RAM in the system. The board is the Cmod A7 Artix 7-35. The project is about as minimal as you can get and still have it do something observable. Well I supposed it could be reduced to a 1-bit counter with 1 output. But I wanted it to be visible on LEDs. :-).

jamey.hicks · February 23, 2018

Vivado is not fast, but it's a big improvement over its predecessor (Xilinx ISE). I found Altera Quartus to be quite slow also.

You can enable reuse of place and route results when you make small changes to the design and that will save some time during development. It still has to do synth_design and opt_design before reusing placement/routing results, but it does save time.

Build times increase with design size, but they increase faster if the toolchain has to work hard to try to meet timing constraints.

gwideman · February 26, 2018

Thanks @xc6lx45, for providing a very thoughtful answer.

Sign In

Vivado slowness reality check

Question

gwideman

Link to comment

Share on other sites

5 answers to this question

Recommended Posts

xc6lx45

Link to comment

Share on other sites

jpeyron

Link to comment

Share on other sites

gwideman

Link to comment

Share on other sites

jamey.hicks

Link to comment

Share on other sites

gwideman

Link to comment

Share on other sites

Archived

Browse

Activity