Latency Information

The latency information presents the execution profile of each compute unit in the binary container. When analyzing this data, it is important to keep in mind that all values are measured from the compute unit boundary through the custom logic. In-system latencies associated with data transfers to global memory are not reported as part of these values. Also, the latency numbers reported are only for compute units targeted at the FPGA fabric. Following is an example of the latency report:

Latency Information (clock cycles)
Compute Unit     Kernel Name    Module Name    Start Interval  Best Case  
---------------  -------------  -------------  --------------  ---------  
smithwaterman_1  smithwaterman  smithwaterman  29468           29467      

Avg Case  Worst Case
--------  ----------
29467     29467

The latency report is divided into the following fields:

Start interval
Best case latency
Average case latency

The start interval defines the amount of time that has to pass between invocations of a compute unit for a given kernel. This number sets the limit as to how fast the runtime can issue application ND range data tiles to a compute unit.

The best and average case latency numbers refer to how much time it takes the compute unit to generate the results of one ND range data tile for the kernel. For cases where the kernel does not have data dependent computation loops, the latency values will be the same. Data dependent execution of loops introduces data specific latency variation that is captured by the latency report.

The interval or latency numbers will be reported as "undef" for kernels with one or more conditions listed below:

Do not have an explicit reqd_work_group_size(x,y,z)
Have loops with variable bounds