Profile Summary Descriptions
The profile summary includes a number of useful statistics for your OpenCL application. This can provide you with a general idea of the functional bottlenecks in your application. The profile summary consists of the following tables:
- OpenCL API Calls - This table displays the profile data for all OpenCL host API function calls executed in the host application.
- Kernel Execution - This table displays the profile data for all kernel functions scheduled and executed.
- Compute Unit Utilization - This table displays the profile data for all compute units on the FPGA.
- Data Transfer: Host and Global Memory - This table displays the profile
data for all read and write transfers between the host and device memory via PCIe® link.
- Number of Transfers: Number of host data transfers (Note: May contain printf transfers).
- Transfer Rate (MB/s): (Total Bytes Sent)/(Total Time in uSec).
- Average Bandwidth Utilization (%): Transfer Rate / (Max. Transfer Rate) where Max. Transfer Rate = 5.0 GBps.
- Average Size (KB): (Total KB sent) / (number of transfers).
- Total Time (ms): Total Time (ms).
- Data Transfer: Kernels and Global Memory - This table displays the
profile data for all read and write transfers between the FPGA and device memory.
- Number of Transfers: Number of transactions monitored on device (Note: May contain printf transfers).
- Transfer Rate (MB/s): (Total Bytes
Sent) / (Compute Unit Total Time)
where Total Bytes Sent is sum of bytes across all transactions.
- Average Bandwidth Utilization (%):
(Transfer Rate) / (Max. Transfer Rate).
Where Max. Transfer Rate = 0.6 * 10.7 GBps = 6.4 GBps.
- Average Size (KB): (Total KB sent) / (number of transactions).
- Average Time (ms): (Total latency of all transaction) / (number of transactions).
- Top Data Transfer: Kernels and Global Memory - This table displays the
profile data for top data transfers between FPGA and device memory.
- Average Bytes per Transfer: (Total Read Bytes + Total Write Bytes) /(Total Read Transactions + Total Write Transactions).
- Transfer Efficiency (%): (Average
Bytes per Transfer) / min(4K, (Memory Bit Width/8 * 256)).
AXI4 specification limits the max burst length to 256 and max burst size to 4K bytes.
- Transfer Rate (MB/s): (Total Data Transfer) / (Compute Unit Total Time).
- Average Bandwidth Utilization (%): (Transfer Rate) / (0.6 * Max. Theoretical Rate).