Building the Device Binary
The kernel code is written in C, C++, OpenCL™ C, or RTL, and is built by compiling the kernel code into a Xilinx® object (XO) file, and linking the XO files into a Xilinx binary (.xclbin) file, as shown in the following figure.
The process, as outlined above, has two steps:
- Build the Xilinx object files
from the kernel source code.
- For C, C++, or OpenCL kernels, the
v++ -c
command compiles the source code into Xilinx object (XO) files. Multiple kernels are compiled into separate XO files. - For RTL kernels, theVivado IP packager command produces the XO file to be used for linking. Refer to RTL Kernels for more information.
- You can also create kernel object (XO) files working directly in the Vitis™ HLS tool. Refer to Compiling Kernels with Vitis HLS for more information.
- For C, C++, or OpenCL kernels, the
- After compilation, the
v++ -l
command links one or multiple kernel objects (XO), together with the hardware platform XSA file, to produce the Xilinx binary .xclbin file.
v++
command can be used from the command line, in scripts, or a build
system like make
, and can also be used through the
Vitis IDE as discussed in Using the Vitis IDE.Compiling Kernels with the Vitis Compiler
v++
options that need to be used to correctly compile your kernel. The
following is an example command line to compile the vadd
kernel:v++ -t sw_emu --platform xilinx_u200_xdma_201830_2 -c -k vadd \
-I'./src' -o'vadd.sw_emu.xo' ./src/vadd.cpp
The various arguments used are described below. Note that some of the arguments are required.
-t <arg>
: Specifies the build target, as discussed in Build Targets. Software emulation (sw_emu
) is used as an example. Optional. The default is hw.--platform <arg>
: Specifies the accelerator platform for the build. This is required because runtime features, and the target platform are linked as part of the FPGA binary. To compile a kernel for an embedded processor application, specify an embedded processor platform:--platform $PLATFORM_REPO_PATHS/zcu102_base/zcu102_base.xpfm
.-c
: Compile the kernel. Required. The kernel must be compiled (-c
) and linked (-l
) in two separate steps.-k <arg>
: Name of the kernel associated with the source files.-o'<output>.xo'
: Specify the shared object file output by the compiler. Optional.<source_file>
: Specify source files for the kernel. Multiple source files can be specified. Required.
The above list is a sample of the extensive options available. Refer to Vitis Compiler Command for details of the various command line options. Refer to Output Directories of the v++ Command to get an understanding of the location of various output files.
Compiling Kernels with Vitis HLS
The use model described for the Vitis core development kit is a top-down approach, starting with C/C++ or OpenCL code, and working toward compiled kernels.
However, you can also directly develop the kernel to produce a Xilinx object (XO) file to be paired for linking using v++
to produce the .xclbin. This approach can be used for C/C++ kernels using the Vitis HLS tool, which is the focus of this section, or
RTL kernels using the Vivado Design Suite. Refer to
RTL Kernels for more information.
The approach of developing the kernel directly, either in RTL or C/C++, to produce an XO file, is sometimes referred to as the bottom-up flow. This allows you to validate kernel performance and perform optimizations within Vitis HLS, and export the Xilinx object file for use in the Vitis application acceleration development flow. Refer to the Vitis HLS Flow for more information on using that tool.
The benefits of the Vitis HLS bottom-up flow can include:
- Design, validate, and optimize the kernel separately from the main application.
- Enables a team approach to design, with collaboration on host program and kernel development.
- Specific kernel optimizations are preserved in the XO file.
- A collection of XO files can be used and reused like a library.
Creating Kernels in Vitis HLS
Generating kernels from C/C++ code for use in the Vitis core development kit follows the standard Vitis HLS process. However, because the kernel is required to operate in the Vitis software platform, the standard kernel requirements must be satisfied (see Kernel Properties). Most importantly, the interfaces must be modeled as AXI memory interfaces, except for scalar parameters which are mapped to an AXI4-Lite interface. Vitis HLS automatically defines the interface ports to meet the standard kernel requirements when using the Vitis Bottom Up Flow as described here.
The process for creating and compiling your HLS kernel is outlined briefly below. You should refer to Creating a New Vitis HLS Project in the Vitis HLS Flow documentation for a more complete description of this process.
- Launch Vitis HLS to open the integrated design environment (IDE), and specify .
- In the New Vitis HLS Project wizard, specify the Project name, define the Location for the project, and click Next.
- In the Add/Remove Files page, click Add Files to add the kernel source code to the project. Select Top Function to define the kernel function by clicking the Browse button, and click Next when done.
- You can specify a C-based simulation test bench if you have one available,
by clicking Add Files, or skip this by
clicking Next.TIP: As discussed in the Vitis HLS documentation, the use of a test bench is strongly recommended.
- In the Solution
Configuration page, you must specify the
Clock Period for the
kernel.
- Choose the target platform by clicking the browse button (…) in the Part Selection field to open the Device Selection Dialog box. Select Boards, and select the target platform for your compiled kernel, as shown below. Click OK to select the platform and return to the Solution Configuration page.
- In the Solution
Configuration page, select the Vitis Kernel Flow Target drop-down menu under Flow Target, and click Finish to complete the process and create your HLS
kernel project.IMPORTANT: You must select the Vitis Kernel Flow Target to generate the Xilinx object (XO) file from the project.
When the HLS project has been created you can Run C-Synthesis to compile the kernel code. Refer to the Vitis HLS documentation for a complete description of the HLS tool flow.
After synthesis is completed, the kernel can be exported as an XO file for use in the Vitis core development kit. The export command is available through the command from the main menu.
Specify the file location, and the kernel is exported as a Xilinx object XO file.
The (XO) file can be used as an input file during the v++
linking process. Refer to Linking the Kernels for more information. You can also add it to an
application project in the Vitis IDE, as discussed
in Creating a Vitis IDE Project.
However, keep in mind that HLS kernels, created in the bottom-up flow described here, have certain limitations when used in the Vitis application acceleration development flow. Software emulation is not supported for applications using HLS kernels, because duplicated header file dependencies can create issues. GDB debug is not supported in the hardware emulation flow for HLS kernels, or RTL kernels.
Vitis HLS Script for Creating Kernels
If you run HLS synthesis through Tcl scripts, you can edit the following script to create HLS kernels as previously described:
# Define variables for your HLS kernel:
set projName <proj_name>
set krnlName <kernel_name>
set krnlFile <kernel_source_code>
set krnlTB <kernel_test_bench>
set krnlPlatform <target_part>
set path <path_to_project>
#Script to create and output HLS kernel
open_project $projName
set_top $krnlName
add_files $krnlFile
add_files -tb $krnlTB
open_solution "solution1"
set_part $krnlPlatform
create_clock -period 10 -name default
config_flow -target vitis
csim_design
csynth_design
cosim_design
export_design -flow impl -format xo -output "./hlsKernel/hlsKernel.xo"
Run the HLS kernel script by using the following command after setting up your environment as discussed in Setting Up the Vitis Environment.
vitis_hls -f <hls_kernel_script>.tcl
Linking the Kernels
The kernel compilation process results in a Xilinx object (XO) file whether the kernel is written in C/C++, OpenCL C, or RTL. During the linking stage, XO files from different kernels are linked with the platform to create the FPGA binary container file (.xclbin) used by the host program.
vadd
kernel
binary:v++ -t sw_emu --platform xilinx_u200_xdma_201830_2 --link vadd.sw_emu.xo \
-o'vadd.sw_emu.xclbin' --config ./connectivity.cfg
This command contains the following arguments:
-t <arg>
: Specifies the build target. Software emulation (sw_emu
) is used as an example. When linking, you must use the same-t
and--platform
arguments as specified when the input (XO) file was compiled.--platform <arg>
: Specifies the platform to link the kernels with. To link the kernels for an embedded processor application, you simply specify an embedded processor platform:--platform $PLATFORM_REPO_PATHS/zcu102_base/zcu102_base.xpfm
--link
: Link the kernels and platform into an FPGA binary file (xclbin).<input>.xo
: Input object file. Multiple object files can be specified to build into the .xclbin.-o'<output>.xclbin'
: Specify the output file name. The output file in the link stage will be an .xclbin file. The default output name is a.xclbin--config ./connectivity.cfg
: Specify a configuration file that is used to providev++
command options for a variety of uses. Refer to Vitis Compiler Command for more information on the--config
option.
Beyond simply linking the Xilinx object (XO) files, the linking process is also where important architectural details are determined. In particular, this is where the number of compute unit (CUs) to instantiate into hardware is specified, connections from kernel ports to global memory are assigned, and CUs are assigned to SLRs. The following sections discuss some of these build options.
Creating Multiple Instances of a Kernel
By default, the linker builds a single hardware instance from a kernel. If the host program will execute the same kernel multiple times, due to data processing requirements for instance, then it must execute the kernel on the hardware accelerator in a sequential manner. This can impact overall application performance. However, you can customize the kernel linking stage to instantiate multiple hardware compute units (CUs) from a single kernel. This can improve performance as the host program can now make multiple overlapping kernel calls, executing kernels concurrently by running separate compute units.
Multiple CUs of a kernel can be created by using the connectivity.nk
option in the v++
config
file during linking. Edit a config file to include the needed options, and specify it in
the v++
command line with the --config
option, as described in Vitis Compiler Command.
vadd
kernel, two hardware
instances can be implemented in the config file as follows:
[connectivity]
#nk=<kernel name>:<number>:<cu_name>.<cu_name>...
nk=vadd:2
Where:
<kernel_name>
- Specifies the name of the kernel to instantiate multiple times.
<number>
- The number of kernel instances, or CUs, to implement in hardware.
<cu_name>.<cu_name>...
- Specifies the instance names for the specified number of instances. This is optional, and the CU name will default to kernel_1 when it is not specified.
v++
command line:
v++ --config vadd_config.cfg ...
In the vadd
example above, the result is two
instances of the vadd
kernel, named vadd_1
and vadd_2
.
xclbinutil
command to examine the contents of the
xclbin file. Refer to xclbinutil Utility. vadd
kernel, named vadd_X
, vadd_Y
, and vadd_Z
in the
xclbin
binary file:
[connectivity]
nk=vadd:3:vadd_X.vadd_Y.vadd_Z
Mapping Kernel Ports to Memory
The link phase is when the memory ports of the kernels are connected to
memory resources which include DDR, HBM, and PLRAM. By default, when the xclbin
file is produced during the v++
linking process, all kernel memory interfaces are connected to the same global
memory bank (or gmem
). As a result, only one kernel
interface can transfer data to/from the memory bank at one time, limiting the performance of
the application due to memory access.
While the Vitis compiler can automatically connect CU to global memory resources, you can also manually specify which global memory bank each kernel argument (or interface) is connected to. Proper configuration of kernel to memory connectivity is important to maximize bandwidth, optimize data transfers, and improve overall performance. Even if there is only one compute unit in the device, mapping its input and output arguments to different global memory banks can improve performance by enabling simultaneous accesses to input and output data.
--conectivity.sp
option to distribute connections across
different memory banks.The following example is based on the Kernel Interfaces example code. Start by assigning the kernel arguments to separate bundles to increase the available interface ports, then assign the arguments to separate memory banks:
- In C/C++ kernels, assign arguments to separate bundles in the kernel code
prior to compiling them:
void cnn( int *pixel, // Input pixel int *weights, // Input Weight Matrix int *out, // Output pixel ... // Other input or Output ports #pragma HLS INTERFACE m_axi port=pixel offset=slave bundle=gmem #pragma HLS INTERFACE m_axi port=weights offset=slave bundle=gmem1 #pragma HLS INTERFACE m_axi port=out offset=slave bundle=gmem
Note that the memory interface inputs
pixel
andweights
are assigned different bundle names in the example above, whileout
is bundled withpixel
. This creates two separate interface ports.IMPORTANT: You must specifybundle=
names using all lowercase characters to be able to assign it to a specific memory bank using the--connectivity.sp
option. - Edit a config file to include the
--connectivity.sp
option, and specify it in thev++
command line with the--config
option, as described in Vitis Compiler Command.For example, for thecnn
kernel shown above, theconnectivity.sp
option in the config file would be as follows:[connectivity] #sp=<compute_unit_name>.<argument>:<bank name> sp=cnn_1.pixel:DDR[0] sp=cnn_1.weights:DDR[1] sp=cnn_1.out:DDR[2]
Where:
<compute_unit_name>
is an instance name of the CU as determined by theconnectivity.nk
option, described in Creating Multiple Instances of a Kernel, or is simply<kernel_name>_1
if multiple CUs are not specified.<argument>
is the name of the kernel argument. Alternatively, you can specify the name of the kernel interface as defined by the HLS INTERFACE pragma for C/C++ kernels, includingm_axi_
and thebundle
name. In thecnn
kernel above, the ports would bem_axi_gmem
andm_axi_gmem1
.TIP: For RTL kernels, the interface is specified by the interface name defined in the kernel.xml file.<bank_name>
is denoted asDDR[0]
,DDR[1]
,DDR[2]
, andDDR[3]
for a platform with four DDR banks. You can also specify the memory as a contiguous range of banks, such as DDR[0:2], in which case XRT will assign the memory bank at run time.Some platforms also provide support for PLRAM, HBM, HP or MIG memory, in which case you would use PLRAM[0], HBM[0], HP[0] or MIG[0]. You can use the
platforminfo
utility to get information on the global memory banks available in a specified platform. Refer to platforminfo Utility for more information.In platforms that include both DDR and HBM memory banks, kernels must use separate AXI interfaces to access the different memories. DDR and PLRAM access can be shared from a single port.
-
IMPORTANT: Customized bank assignments might also need to be reflected in the host code in some cases, as described in Assigning DDR Bank in Host Code.
Connecting Directly to Host Memory
The PCIe® Slave-Bridge IP is
provided on some data center platforms to let kernels access directly to host memory.
Configuring the device binary to connect to memory requires changing the link specified by
the --connectivity.sp
command below. It also requires
changes to the accelerator card setup and your host application as described at Host-Memory Access in the XRT documentation.
[connectivity]
## Syntax
##sp=<cu_name>.<interface_name>:HOST[0]
sp=cnn_1.m_axi_gmem:HOST[0]
In the command syntax above, the CU name and interface name are the same,
but the bank name is hard-coded to HOST[0]
.
HBM Configuration and Use
Some algorithms are memory bound, limited by the 77GB/s bandwidth available on DDR-based Alveo cards. For those applications there are HBM (High Bandwidth Memory) based Alveo cards, providing up to 460 GB/s memory bandwidth. For the Alveo implementation, 2 16-layer HBM (HBM2 specification) stacks are incorporated into the FPGA package and connected into the FPGA fabric with an interposer. A high-level diagram of the two HBM stacks is as follows.
This implementation provides:
- 8 GB HBM memory
- 32 256 MB HBM segments, called pseudo channels (PCs)
- An independent AXI channel for communication with the FPGA through a segmented crossbar switch per pseudo channel
- A two-channel memory controller per two PCs
- 14.375 GB/s max theoretical bandwidth per PC
- 460 GB/S ( 32 *14.375 GB/s) max theoretical bandwidth for the HBM subsystem
Although each PC has a theoretical max performance of 14.375 GB/s, this is less than the theoretical max of 19.25 GB/s for a DDR channel. To get better than DDR performance, designs must efficiently use multiple AXI masters into the HBM subsystem. The programmable logic has 32 HBM AXI interfaces that can access any memory location in any of the PCs on either of the HBM stacks through a built-in switch providing access to the full 8 GB memory space. For more detailed information on the HBM, refer to AXI High Bandwidth Controller LogiCORE IP Product Guide (PG276).
Connection to the HBM is managed by
the HBM Memory Subsystem (HMSS) IP, which enables
all HBM PCs, and automatically connects the XDMA to
the HBM for host access to global memory. When used
with the Vitis compiler, the HMSS is automatically
customized to activate only the necessary memory controllers and ports as specified by
the --connectivity.sp
option to connect both the user
kernels and the XDMA to those memory controllers for optimal bandwidth and latency.
Refer to the Using HBM Tutorial for additional
information and examples.
In the following config file example, the kernel input ports in1
and in2
are connected
to HBM PCs 0 and 1 respectively, and writes output
buffer out
to HBM
PCs 3–4. Each HBM PC is 256 MB, giving a total of 1 GB of memory access for this
kernel.
[connectivity]
sp=krnl.in1:HBM[0]
sp=krnl.in2:HBM[1]
sp=krnl.out:HBM[3:4]
The HBM ports are located in the bottom SLR of the device. The HMSS
automatically handles the placement and timing complexities of AXI interfaces crossing
super logic regions (SLR) in SSI technology devices. By default, without specifying the
--connectivity.sp
or --connectivity.slr
options on v++
, all
kernel AXI interfaces access HBM[0] and all kernels are assigned to SLR0. However, you
can specify the SLR assignments of kernels using the --connectivity.slr
option. Refer to Assigning Compute Units to SLRs for more information.
Random Access and the RAMA IP
HBM performs well in applications where sequential data access is required. However, for applications requiring random data access, performance can vary significantly depending on the application requirements (for example, the ratio of read and write operations, minimum transaction size, and size of the memory space being addressed). In these cases, the addition of the Random Access Memory Attachment (RAMA) IP to the target platform can significantly improve random memory access efficiency in cases where the required memory exceeds the 256 MB limit of a single HBM PC. Refer to RAMA LogiCORE IP Product Guide (PG310) for more information.
Add the RAMA IP to the target platform during the system linking process using
the following v++
command option to specify a Tcl
script to define the ports of interest:
v++ -l --advanced.param compiler.userPreSysLinkOverlayTcl=<path_to>/user_tcl_file.tcl
hbm_memory_subsystem::ra_master_interface <Endpoint AXI master interface> [get_bd_cells hmss_0]
The following example has two AXI master ports (M00_AXI
and M01_AXI
) for random
access:
hbm_memory_subsystem::ra_master_interface [get_bd_intf_pins dummy/M00_AXI] [get_bd_cells hmss_0]
hbm_memory_subsystem::ra_master_interface [get_bd_intf_pins dummy/M01_AXI] [get_bd_cells hmss_0]
validate_bd_design -force
It is important to end the Tcl script with the validate_bd_design
command as shown above to allow the information to be
collected correctly by the HBM subsystem, and the
block design to be updated.
PLRAM Configuration and Use
Alveo accelerator cards contain HBM DRAM and DDR DRAM memory resources. In some accelerator cards, an additional memory resource available is internal FPGA PLRAM (UltraRAM and block RAM). Supporting platforms typically contain instances of PLRAM in each SLR. The size and type of each PLRAM can be configured on the target platform before kernels or Compute Units are linked into the system.
v++
command line as follows:
v++ -l --advanced.param compiler.userPreSysLinkOverlayTcl=<path_to>/user_tcl_file.tcl
sdx_memory_subsystem::update_plram_specification <memory_subsystem_bdcell> <plram_resource> <plram_specification>
The <plram_specification>
is a Tcl
dictionary consisting of the following entries (entries below are the default values for
each instance in the platform):
{
SIZE 128K # Up to 4M
AXI_DATA_WIDTH 512 # Up to 512
SLR_ASSIGNMENT SLR0 # SLR0 / SLR1 / SLR2
READ_LATENCY 1 # To optimise timing path
MEMORY_PRIMITIVE BRAM # BRAM or URAM
}
In the example below, PLRAM_MEM00
is changed to
be 2 MB in size and composed of UltraRAM; PLRAM_MEM01
is changed to be 4 MB in size and composed of UltraRAM. PLRAM_MEM00
and PLRAM_MEM01
correspond to
the --conectivity.sp
memory resources PLRAM[0] and
PLRAM[1].
# Setup PLRAM
sdx_memory_subsystem::update_plram_specification
[get_bd_cells /memory_subsystem] PLRAM_MEM00 { SIZE 2M AXI_DATA_WIDTH 512
SLR_ASSIGNMENT SLR0 READ_LATENCY 10 MEMORY_PRIMITIVE URAM}
sdx_memory_subsystem::update_plram_specification
[get_bd_cells /memory_subsystem] PLRAM_MEM01 { SIZE 4M AXI_DATA_WIDTH 512
SLR_ASSIGNMENT SLR0 READ_LATENCY 10 MEMORY_PRIMITIVE URAM}
validate_bd_design -force
save_bd_design
The READ_LATENCY
is an important attribute,
because it sets the number of pipeline stages between memories cascaded in depth. This
varies by design, and affects the timing QoR of the platform and the eventual kernel
clock rate. In the example above for PLRAM_MEM01
:
- 4 MB of memory are required in total.
- Each UltraRAM is 32 KB (64 bits wide). 4 MB × 32 KB → 128 UltraRAMs in total.
- Each PLRAM instance is 512 bits wide → 8 UltraRAMs are required in width.
- 128 total UltraRAMs with 8 UltraRAMs in width → 16 UltraRAMs in depth.
- A good rule of thumb is to pick a read latency of depth/2 + 2 → in this
case,
READ_LATENCY
= 10.
This allows a pipeline on every second UltraRAM, resulting in the following:
- Good timing performance between UltraRAMs.
- Placement flexibility; not all UltraRAMs need to be placed in the same UltraRAM column for cascade.
Specifying Streaming Connections between Compute Units
The Vitis core development kit supports streaming data transfer between two kernels, allowing data to move directly from one kernel to another without having to transmit back through global memory. However, the process has to be implemented in the kernel code itself, as described in Streaming Data in User-Managed Never-Ending Kernels, and also specified during the kernel build process.
The streaming data ports of kernels can be connected during v++
linking using the --connectivity.sc
option. This option can be specified at the command
line, or from a config
file that is specified using
the --config
option, as described in Vitis Compiler Command.
To connect the streaming output port of a producer kernel to the streaming
input port of a consumer kernel, setup the connection in the v++
config file using the connectivity.stream_connect
option as follows:
[connectivity]
#stream_connect=<cu_name>.<output_port>:<cu_name>.<input_port>:[<fifo_depth>]
stream_connect=vadd_1.stream_out:vadd_2.stream_in
Where:
<cu_name>
is an instance name of the CU as determined by theconnectivity.nk
option, described in Creating Multiple Instances of a Kernel.<output_port>
or<input_port>
is the streaming port defined in the producer or consumer kernel.[:<fifo_depth>]
inserts a FIFO of the specified depth between the two streaming ports to prevent stalls. The value is specified as an integer.
Assigning Compute Units to SLRs
Currently, Xilinx devices on Data Center accelerator cards use stacked silicon consisting of several Super Logic Regions (SLRs) to provide device resources, including global memory. For best performance, when assigning ports to global memory banks, as described in Mapping Kernel Ports to Memory, it is best that the CU instance is assigned to the same SLR as the global memory it is connected to. In this case, you will want to manually assign the kernel instance, or CU into the same SLR as the global memory to ensure the best performance.
A CU can be assigned to an SLR using the connectivity.slr
option in a config file. The syntax of the connectivity.slr
option in the config file is as follows:
[connectivity]
#slr=<compute_unit_name>:<slr_ID>
slr=vadd_1:SLR2
slr=vadd_2:SLR3
Where:
<compute_unit_name>
is an instance name of the CU as determined by theconnectivity.nk
option, described in Creating Multiple Instances of a Kernel, or is simply<kernel_name>_1
if multiple CUs are not specified.<slr_ID>
is the SLR number to which the CU is assigned, in the form SLR0, SLR1,...
The assignment of a CU to an SLR must be specified for each CU
separately, but is not required. If an assigned CU is connected to global memory located
in another SLR, the tool will automatically insert SLR crossing registers to help with
timing closure. In the absence of an SLR assignment, the v++
linker is free to assign the CU to any SLR.
v++
linking process by specifying
the config file using the --config
option:
v++ -l --config config_slr.cfg ...
Managing Clock Frequencies
--clock
options described here are only supported by
platform shells with fixed status clocks as described in Identifying Platform Clocks. On older platform shells without fixed clocks, kernels operate at the default
operating frequency of the platform. Generally, in embedded processor platforms, and also in newer Data Center accelerator cards, the device binary can connect multiple kernels to the platform with different clock frequencies. Each kernel, or unique instance of the kernel can connect to a specified clock frequency, or multiple clocks, and different kernels can use different clock frequencies generated by the platform.
During the compilation process (v++
-c
), you can specify a kernel frequency using the --hls.clock
command. This lets you compile the kernel targeting the specified frequency, and lets
the Vitis HLS tool perform validation of the kernel
logic at the specified frequency. However, this is just an implementation target for
compilation, but does provide optimization and feedback.
During the linking process, when the kernels are connected to the
platform to build the device binary, you can specify the clock frequency for the kernels
using the --clock Options of the v++
command. Therefore the process for managing clock
frequencies is as follows:
- Compile the HLS code at a specified frequency using the Vitis
compiler:
v++ -c -k <krnl_name> --hls.clock freqHz:<krnl_name>
TIP:freqHz
must be in Hz (for example,250000000Hz
is 250 MHz). - During linking, specify the clock frequency or clock ID for each
clock signal in a kernel with the following command:
v++ -l ... --clock.freqHz <freqHz>:kernelName.clk_name
You can specify the --clock
option
using either a clock ID from the platform shell, or by specifying a frequency for the
kernel clock. When specifying the clock ID, the kernel frequency is defined by the
frequency of that clock ID on the platform. When specifying the kernel frequency, the
platform attempts to create the specified frequency by scaling one of the available
fixed
platform clocks. In some cases, the clock
frequency can only be achieved in some approximation, and you can specify the --clock.tolerance
or --clock.default_tolerance
to indicate an acceptable range. If the
available fixed clock cannot be scaled within the acceptable tolerance, a warning is
issued and the kernel is connected to the default clock.
Identifying Platform Clocks
platforminfo -v
command. For example, the following
command returns verbose information related to a newer shell for the U200 platform:
platforminfo -v -p xilinx_u200_gen3x16_xdma_1_202110_1 -o pfmClocks.txt
=================
Clock Information
=================
...
...
Clock Index: 2
Frequency: 50.000000
Name: ii_level0_wire_ulp_m_aclk_ctrl_00
Pretty Name: PL 2
Inst Ref: ii_level0_wire
Comp Ref: ii_level0_wire
Period: 20.000000
Normalized Period: .020000
Status: fixed
...
In the example above you can see that Clock
Index: 2 is a fixed
status clock. In
fact, although not shown here, this platform shell provides three fixed status clocks
which can be used to drive multiple clock frequencies in linked kernels. You can use the
--clock.xxx
options to drive kernel clocks as
explained above.
xilinx_u200_xdma_201830_2
, there are no fixed platform clocks to drive
clock frequencies in linked kernels. This can be determined by reviewing the Clock Information reported by the following command:
platforminfo -v -p xilinx_u200_xdma_201830_2 -o pfmClocks.txt
On legacy platforms, without fixed-status clocks, you can use the
v++ --kernel frequency
option to specify the clock
frequency as described in Vitis Compiler General Options.
Managing Vivado Synthesis and Implementation Results
In most cases, the Vitis environment completely abstracts away the underlying process of synthesis and implementation of the programmable logic region, as the CUs are linked with the hardware platform and the FPGA binary (xclbin) is generated. This removes the application developer from the typical hardware development process, and the need to manage constraints such as logic placement and routing delays. The Vitis tool automates much of the FPGA implementation process.
However, in some cases you might want to exercise some control over
the synthesis and implementation processes deployed by the Vitis compiler, especially when large designs are being implemented.
Towards this end, the Vitis tool offers some
control through specific options that can be specified in a v++
configuration file, or from the command line. The following are
some of the methods you can interact with and control the Vivado synthesis and implementation results.
- Using the
--vivado
options to manage the Vivado tool. - Using multiple implementation strategies to achieve timing closure on challenging designs.
- Using the
-to_step
and-from_step
options to run the compilation or linking process to a specific step, perform some manual intervention on the design, and resume from that step. - Interactively editing the Vivado project, and using the results for generating the FPGA binary.
Using the -vivado and -advanced Options
Using the --vivado
option, as
described in --vivado Options, and the --advanced
option as described in --advanced Options, you can perform a number of
interventions on the standard Vivado synthesis
or implementation.
- Pass Tcl scripts with custom design constraints or scripted
operations.
You can create Tcl scripts to assign XDC design constraints to objects in the design, and pass these Tcl scripts to the Vivado tools using the PRE and POST Tcl script properties of the synthesis and implementation steps. For more information on Tcl scripting, refer to the Vivado Design Suite User Guide: Using Tcl Scripting (UG894). While there is only one synthesis step, there are a number of implementation steps as described in the Vivado Design Suite User Guide: Implementation (UG904). You can assign Tcl scripts for the Vivado tool to run before the step (PRE), or after the step (POST). The specific steps you can assign Tcl scripts to include the following:
SYNTH_DESIGN
,INIT_DESIGN
,OPT_DESIGN
,PLACE_DESIGN
,ROUTE_DESIGN
,WRITE_BITSTREAM
.TIP: There are also some optional steps that can be enabled using the--vivado.prop run.impl_1.steps.phys_opt_design.is_enabled=1
option. When enabled, these steps can also have Tcl PRE and POST scripts.An example of the Tcl PRE and POST script assignments follow:
--vivado.prop run.impl_1.STEPS.PLACE_DESIGN.TCL.PRE=/…/xxx.tcl
In the preceding example a script has been assigned to run before the PLACE_DESIGN step. The command line is broken down as follows:
--vivado
is thev++
command-line option to specify directives for the Vivado tools.prop
keyword to indicate you are passing a property setting.run.
keyword to indicate that you are passing a run property.impl_1.
indicates the name of the run.STEPS.PLACE_DESIGN.TCL.PRE
indicates the run property you are specifying.- /.../xx.tcl indicates the property value.
TIP: Both the--advanced
and--vivado
options can be specified on thev++
command line, or in a configuration file specified by the--config
option. The example above shows the command line use, and the following example shows the config file usage. Refer to Vitis Compiler Configuration File for more information. - Setting properties on run, file, and fileset design objects.This is very similar to passing Tcl scripts as described above, but in this case you are passing values to different properties on multiple design objects. For example, to use a specific implementation strategy such as
Performance_Explore
and disable global buffer insertion during placement, you can define the properties as shown below:[vivado] prop=run.impl_1.STEPS.OPT_DESIGN.ARGS.DIRECTIVE=Explore prop=run.impl_1.STEPS.PLACE_DESIGN.ARGS.DIRECTIVE=Explore prop=run.impl_1.{STEPS.PLACE_DESIGN.ARGS.MORE OPTIONS}={-no_bufg_opt} prop=run.impl_1.STEPS.PHYS_OPT_DESIGN.IS_ENABLED=true prop=run.impl_1.STEPS.PHYS_OPT_DESIGN.ARGS.DIRECTIVE=Explore prop=run.impl_1.STEPS.ROUTE_DESIGN.ARGS.DIRECTIVE=Explore
In the example above, the
Explore
value is assigned to theSTEPS.XXX.DIRECTIVE
property of various steps of the implementation run. Note the syntax for defining these properties is:<object>.<instance>.property=<value>
Where:
<object>
can be a design run, a file, or a fileset object.<instance>
indicates a specific instance of the object.<property>
specifies the property to assign.<value>
defines the value of the property.
In this example the object is a run, the instance is the default implementation run,
impl_1
, and the property is an argument of the different step names, In this case the DIRECTIVE, IS_ENABLED, and {MORE OPTIONS}. Refer to --vivado Options for more information on the command syntax. - Enabling optional steps in the Vivado implementation process.
The build process runs Vivado synthesis and implementation to generate the device binary. Some of the implementation steps are enable and run as part of the default build process, and some of the implementation steps can be optionally enabled at your discretion.
Optional steps can be listed using the
--list_steps
command, and include:vpl.impl.power_opt_design
,vpl.impl.post_place_power_opt_design
,vpl.impl.phys_opt_design
, andvpl.impl.post_route_phys_opt_design
.An optional step can be enabled using the
--vivado.prop
option. For example, to enable PHYS_OPT_DESIGN step, use the following config file content:[vivado] prop=run.impl_1.steps.phys_opt_design.is_enabled=1
When an optional step is enabled as shown above, the step can be specified as part of the
-from_step
/-to_step
command as described below in Running --to_step or --from_step, or enable a Tcl script to run before or after the step as described in --linkhook Options. - Passing parameters to the tool to control processing.The
--vivado
option also allows you to pass parameters to the Vivado tools. The parameters are used to configure the tool features or behavior prior to launching the tool. The syntax for specifying a parameter uses the following form:--vivado.param <object><parameter>=<value>
The keyword
param
indicates that you are passing a parameter for the Vivado tools, rather than a property for a design object. You must also define the<object>
it applies to, the<parameter>
that you are specifying, and the<value>
to assign it.The following example project indicates the current Vivado project,
writeIntermedateCheckpoints
, is the parameter being passed and the value is 1, which enables this boolean parameter.--vivado.param project.writeIntermediateCheckpoints=1
- Managing the reports generated during synthesis and
implementation.IMPORTANT: You must also specify
--save-temps
on thev++
command line when customizing the reports generated by the Vivado tool to preserve the temporary files created during synthesis and implementation, including any generated reports.You might also want to generate or save more than the standard reports provided by the Vivado tools when run as part of the Vitis tools build process. You can customize the reports generated using the
--advanced.misc
option as follows:[advanced] misc=report=type report_utilization name synth_report_utilization_summary steps {synth_design} runs {__KERNEL__} options {} misc=report=type report_timing_summary name impl_report_timing_summary_init_design_summary steps {init_design} runs {impl_1} options {-max_paths 10} misc=report=type report_utilization name impl_report_utilization_init_design_summary steps {init_design} runs {impl_1} options {} misc=report=type report_control_sets name impl_report_control_sets_place_design_summary steps {place_design} runs {impl_1} options {-verbose} misc=report=type report_utilization name impl_report_utilization_place_design_summary steps {place_design} runs {impl_1} options {} misc=report=type report_io name impl_report_io_place_design_summary steps {place_design} runs {impl_1} options {} misc=report=type report_bus_skew name impl_report_bus_skew_route_design_summary steps {route_design} runs {impl_1} options {-warn_on_violation} misc=report=type report_clock_utilization name impl_report_clock_utilization_route_design_summary steps {route_design} runs {impl_1} options {}
The syntax of the command line is explained using the following example:
misc=report=type report_bus_skew name impl_report_bus_skew_route_design_summary steps {route_design} runs {impl_1} options {-warn_on_violation}
misc=report=
- Specifies the
--advanced.misc
option as described in --advanced Options, and defines the report configuration for the Vivado tool. The rest of the command line is specified in name/value pairs, reflecting the options of thecreate_report_config
Tcl command as described in Vivado Design Suite Tcl Command Reference Guide (UG835). type report_bus_skew
- Relates to the
-report_type
argument, and specifies the type of the report as thereport_bus_skew
. Most of thereport_*
Tcl commands can be specified as the report type. name impl_report_bus_skew_route_design_summary
- Relates to the
-report_name
argument, and specifies the name of the report. Note this is not the file name of the report, and generally this option can be skipped as the report names will be auto-generated by the tool. steps {route_design}
- Relates to the
-steps
option, and specifies the synthesis and implementation steps that the report applies to. The report can be specified for use with multiple steps to have the report regenerated at each step, in which case the name of the report will be automatically defined. runs {impl_1}
- Relates to the
-runs
option, and specifies the name of the design runs to apply the report to. options {-warn_on_violation}
- Specifies various options of the
report_*
Tcl command to be used when generating the report. In this example, the-warn_on_violation
option is a feature of thereport_bus_skew
command.IMPORTANT: There is no error checking to ensure the specified options are correct and applicable to the report type specified. If you indicate options that are incorrect the report will return an error when it is run.
Running Multiple Implementation Strategies for Timing Closure
For challenging designs, it can take multiple iterations of Vivado implementation using multiple different strategies
to achieve timing closure. This topic shows you how to launch multiple implementation
strategies at the same time in the hardware build (-t
hw
), and how to identify and use successful runs to generate the device binary
and complete the build.
As explained in --vivado Options the
--vivado.impl.strategies
command enables you to
specify multiple strategies to run in a single build pass. The command line would look
as follows:
v++ --link -s -g -t hw --platform xilinx_zcu102_base_202010_1 -I . \
--vivado.impl.strategies "Performance_Explore,Area_Explore" -o kernel.xclbin hello.xo
In the example above, the Performance_Explore
and Area_Explore
strategies are run simultaneously in the Vivado
build to see which returns the best results. You can specify the ALL
to have all available strategies run within the tool.
You can also determine this option in a configuration file in the following form:
#Vivado Implementation Strategies
[vivado]
impl.strategies=Performance_Explore,Area_Explore
The Vitis compiler automatically
picks the first completed run results that meets timing to proceed with the build
process and generate the device binary. However, you can also direct the tool to wait
for all runs to complete and pick the best results from the completed runs before
proceeding. This would require the use of the --advanced.compiler
directive as shown:
[advanced]
param=compiler.multiStrategiesWaitOnAllRuns=1
compiler.multiStrategiesWaitOnAllRuns=0
represents the default behavior.
If you want v++
to wait for all runs to complete, which will get their
report files, change that parameter value to 1.
As discussed in Link Summary: Multiple Strategies and Timing Reports, Vitis analyzer displays the implementation results for the all strategies that have been allowed to run to completion. This includes an overview of the implementation results, as well as a Timing Summary report. You can use this feature to review the different strategies and results.
You can also manually review the results of all implementation strategies
after they have completed. Then, use the results of any of the implementation runs by
using the --reuse_impl
option as described in Using -to_step and Launching Vivado Interactively.
Using -to_step and Launching Vivado Interactively
The Vitis compiler lets you
stop the build process after completing a specified step (--to_step
), manually intervene in the design or files in some way, and
then continue the build by specifying a step the build should resume from (--from_step
). The --from_step
directs the Vitis
compiler to resume compilation from the step where --to_step
left off, or some earlier step in the process. The --to_step
and --from_step
are described in Vitis Compiler General Options.
--to_step
and --from_step
options are sequential build options that require you to
use the same project directory when launching v++ --link
--from_step
as you specified when using v++
--link --to_step
. The Vitis compiler also
provides a --list_steps
option to list the
available steps for the compilation or linking processes of a specific build target.
For example, the list of steps for the link process of the hardware build can be
found by:
v++ --list_steps --target hw --link
This command returns a number of steps, both default steps and
optional steps that the Vitis compiler goes
through during the linking process of the hardware build. Some of the default steps
include: system_link
, vpl
, vpl.create_project
, vpl.create_bd
, vpl.generate_target
, vpl.synth
,
vpl.impl.opt_design
, vpl.impl.place_design
, vpl.impl.route_design
, and vpl.impl.write_bitstream
.
vpl.impl.power_opt_design
, vpl.impl.post_place_power_opt_design
, vpl.impl.phys_opt_design
, and vpl.impl.post_route_phys_opt_design
. --from_step
or --to_step
as
previously described in Using the -vivado and -advanced Options.Launching the Vivado IDE for Interactive Design
For example, with the --to_step
command, you can launch the build process to Vivado synthesis and then start the Vivado IDE on the project to manually place and route the design. To
perform this you would use the following command syntax:
v++ --target hw --link --to_step vpl.synth --save-temps --platform <PLATFORM_NAME> <XO_FILES>
--save-temps
when using --to_step
to preserve any temporary
files created by the build process. This command specifies the link process of the hardware build, runs the build through the synthesis step, and saves the temporary files produced by the build process.
You can launch the Vivado tool
directly on the project built by the Vitis
compiler using the --interactive
command. This opens the Vivado project found at <temp_dir>/link/vivado/vpl/prj in your build directory,
letting you interactively edit the design:
v++ --target hw --link --interactive --save-temps --platform <PLATFORM_NAME> <XO_FILES>
When invoking the Vivado IDE in this mode, you can open the synthesis or implementation runs to manage and modify the project. You can change the run details as needed to close timing and try different approaches to implementation. You can save the results to a design checkpoint (DCP), or generate the project bitstream (.bit) to use in the Vitis environment to generate the device binary.
After saving the DCP from within the Vivado IDE, close the tool and return to the Vitis environment. Use the --reuse_impl
option to apply a previously implemented DCP file in the
v++ command line to generate the xclbin
.
--reuse_impl
option is an incremental build
option that requires you to apply the same project directory when resuming the
Vitis compiler with --reuse_impl
that you specified when using --to_step
to start the build. The following command completes the linking process by using the specified DCP file from the Vivado tool to create the project.xclbin from the input files.
v++ --link --platform <PLATFORM_NAME> -o'project.xclbin' project.xo --reuse_impl ./_x/link/vivado/routed.dcp
v++ --link --platform <PLATFORM_NAME> -o'project.xclbin' project.xo --reuse_bit ./_x/link/vivado/project.bit
Additional Vivado Options
Some additional switches that can be used in the v++
command line or config file include the
following:
--export_script
/--custom_script
edit and use Tcl scripts to modify the compilation or linking process.--remote_ip_cache
specify a remote IP cache directory for Vivado synthesis.--no_ip_cache
turn off the IP cache for Vivado synthesis. This causes all IP to be re-synthesized as part of the build process, scrubbing out cached data.
Controlling Report Generation
The v++
-R
option (or --report_level
) controls
the level of information to report during compilation or linking for hardware emulation
and system targets. Builds that generate fewer reports will typically run more
quickly.
The command line option is as follows:
$ v++ -R <report_level>
Where <report_level>
is one of the
following options:
-R0
: Minimal reports and no intermediate design checkpoints (DCP).-R1
: Includes R0 reports plus:- Identifies design characteristics to review for each kernel
(
report_failfast
). - Identifies design characteristics to review for the full post-optimization design.
- Saves post-optimization design checkpoint (DCP) file for later examination or use in the Vivado Design Suite.
TIP:report_failfast
is a utility that highlights potential device usage challenges, clock constraint problems, and potential unreachable target frequency (MHz).- Identifies design characteristics to review for each kernel
(
-R2
: Includes R1 reports plus:- Includes all standard reports from the Vivado tools, including saved DCPs after each implementation step.
- Design characteristics to review for each SLR after placement.
-Restimate
: Forces Vitis HLS to generate a System Estimate report, as described in System Estimate Report.TIP: This option is useful for the software emulation build (-t sw_emu
).