Building for the Target FPGA/Platform

The SDAccel™ development environment generates custom logic for every compute unit in the binary container. Therefore, it is normal for this build step to run for a longer period of time than the other steps in the SDAccel application compilation flow.

The steps in compiling compute units targeting the FPGA fabric are as follows:

Generate a custom compute unit for a specific kernel.
Instantiate the compute units in the OpenCL® binary container.
Connect the compute units to memory and infrastructure elements of the target device.
Generate the FPGA programming file.

The generation of custom compute units for any given kernel code uses the production proven capabilities of the Xilinx® Vivado® High-Level Synthesis (HLS) tool, which is the compute unit generator in the SDAccel Environment. Based on the characteristics of the target device in the solution, the SDAccel Environment invokes the compute unit compiler to generate custom logic that maximizes performance while at the same time minimizing compute resource consumption on the FPGA fabric. Automatic optimization of a compute unit for maximum performance is not possible for all coding styles without additional user input to the compiler. The SDAccel Environment Profiling and Optimization Guide discusses the additional user input that can be provided to the SDAccel Environment to optimize the implementation of kernel operations into a custom compute unit.

After all compute units have been generated, these units are connected to the infrastructure elements provided by the target device in the solution. The infrastructure elements in a device are all of the memory, control, and I/O data planes which the device developer has defined to support an OpenCL application. The SDAccel Environment combines the custom compute units and the base device infrastructure to generate an FPGA binary which is used to program the Xilinx device during application execution.

IMPORTANT!: The SDAccel Environment always generates a valid FPGA hardware design, but does not generate an optimal allocation of the available bandwidth in the control and memory data planes. The user can manually optimize the data bandwidth usage by selecting connection points into the memory and control data planes per compute unit.