Migrating to a New Target Platform

This migration guide is intended for users who need to migrate their accelerated SDAccel™ environment application from one target platform to another. For example, moving an application from a Virtex® UltraScale+™ VCU1525 Acceleration Development Board to a U200 Acceleration Development Board.

The following topics are addressed as part of this:

  • An overview of the Design Migration Process including the physical aspects of FPGA devices.
  • Any changes to the host code and design constraints if a new release is used.
  • Controlling kernel placements and DDR interface connections.
  • Timing issues in the new shell which might require additional options to achieve performance.

Design Migration

When migrating an application implemented in one target platform to another, it is important to understand the differences between the target platforms, and the impact those differences have on the design.

Key considerations:

  • Is there a change in the release?
  • Does the new target platform contain a different shell?
  • Do the kernels need to be redistributed across the Super Logic Regions (SLRs)?
  • Does the design meet the required frequency (timing) performance in the new platform?

The following diagram summarizes the migration flow described in this guide, and the topics to consider during the migration process.

Figure: Shell Migration Flowchart



IMPORTANT: Before starting to migrate a design it is important to understand the architecture of an FPGA and the shell.

Understanding an FPGA Architecture

Before migrating any design to a new target platform, you should have a fundamental understanding of the FPGA architecture. The following diagram shows the floorplan of a Xilinx® FPGA device. The concepts to understand are:

  • SSI Devices
  • SLRs
  • SLR routing resources
  • Memory interfaces

Figure: Physical View of Xilinx FPGA with Four SLR Regions



TIP: The FPGA floorplan shown above is for a SSI device with four SLRs where each SLR contains a DDR Memory interface.

Stacked Silicon Interconnect Devices

A SSI device is one in which multiple silicon dies are connected together via silicon interconnect, and packaged into a single device. An SSI device enables high-bandwidth connectivity between multiple die by providing a much greater number of connections. It also imposes much lower latency and consumes dramatically lower power than either a multiple FPGA or a multi-chip module approach, while enabling the integration of massive quantities of interconnect logic, transceivers, and on-chip resources within a single package. The advantages of SSI devices are detailed in Xilinx Stacked Silicon Interconnect Technology Delivers Breakthrough FPGA Capacity, Bandwidth, and Power Efficiency.

Super Logic Region

An SLR is a single FPGA die slice contained in an SSI device. Multiple SLR components are assembled to make up an SSI device. Each SLR contains the active circuitry common to most Xilinx FPGA devices. This circuitry includes large numbers of:

  • LUTs
  • Registers
  • I/O Components
  • Gigabit Transceivers
  • Block Memory
  • DSP Blocks

One or more kernels may be implemented within an SLR. A single kernel may not be implemented across multiple SLRs.

SLR Routing Resources

The custom hardware implemented on the FPGA is connected via on-chip routing resources. There are two types of routing resources in an SSI device:

Intra-SLR Resources
Intra-SLR routing resource are the fast resources used to connect the hardware logic. The SDAccel environment automatically uses the most optimal resources to connect the hardware elements when implementing kernels.
Super Long Line (SLL) Resources
SLLs are routing resources running between SLRs, used to connect logic from one region to the next. These routing resources are slower than intra-SLR routes. However, when a kernel is placed in one SLR, and the DDR it connects to is in another, the SDAccel environment automatically implements dedicated hardware to use SLL routing resources without any impact to performance. More details on managing placement are provided in Modifying Kernel Placement.

Memory Interfaces

Each SLR contains one or more memory interfaces. These memory interfaces are used to connect to the DDR memory where the data in the host buffers is copied before kernel execution. Each kernel will read data from the DDR memory and write the results back to the same DDR memory. The memory interface connects to the pins on the FPGA and includes the memory controller logic.

Understanding Shells

In the SDAccel development environment, a shell is the hardware design that is implemented onto the FPGA before any custom logic, or accelerators are added. The shell defines the attributes of the FPGA used in the target platform and is composed of two regions:

  • Static region which contains kernel and device management logic.
  • Dynamic region where the custom logic of the accelerated kernels is placed.
The figure below shows an FPGA with the shell applied.

Figure: Shell on an FPGA with Four SLR Regions



The shell, which is a static region that cannot be modified by the user, contains the logic required to operate the FPGA, and transfer data to and from the dynamic region. The static region, shown above in gray, might exist within a single SLR, or as in the above example, might span multiple SLRs. The static region contains:

  • DDR memory interface controllers
  • PCIe® interface logic
  • XDMA logic
  • Firewall logic, etc.

The dynamic region is the area shown in white above. This region contains all the reconfigurable components of the shell and is the region where all the accelerator kernels are placed.

Because the static region consumes some of the hardware resources available on the device, the custom logic to be implemented in the dynamic region can only use the remaining resources. In the example shown above, the shell defines that all four DDR memory interfaces on the FPGA can be used. This will require resources for the memory controller used in the DDR interface.

Details on how much logic may be implemented in the dynamic region of each shell is provided in the SDx Environments Release Notes, Installation, and Licensing Guide. This topic is also addressed in Modifying Kernel Placement, later in this guide.

Migrating Releases

Before migrating to a new target platform, you should also determine if you will need to target the new platform to a different release of the SDAccel environment. If you do intend to target a new release, it is highly recommended to first target the existing platform using the new software release to confirm there are no changes required, and then migrate to a new target platform.

There are two steps to follow when targeting a new release with an existing platform:

  • Host Code Migration
  • Release Migration
IMPORTANT: Before migrating to a new release, it is recommended that you review the SDx Environments Release Notes, Installation, and Licensing Guide.

Host Code Migration

In the 2018.3 release of the SDAccel environment there are some fundamental changes to how the Xilinx Runtime (XRT) environment and shell(s) are installed. In previous releases, both the XRT environment and shell(s) were automatically installed with the SDAccel environment. This has implications on the setup required to compile the host code.

Refer to the SDx Environments Release Notes, Installation, and Licensing Guide for details on the 2018.3 installation.

The XILINX_XRT environment variable is used to specify the location of the XRT environment and must be set before you compile the host code. When the XRT environment has been installed, the XILINX_XRT environment variable can be set by sourcing the /opt/xilinx/xrt/setup.csh, or /opt/xilinx/xrt/setup.sh file as appropriate. Secondly, ensure that your LD_LIBRARY_PATH variable also points to the XRT installation area.

To compile, and run the host code, make sure you source the <SDX_INSTALL_DIR>/settings64.csh, or <SDX_INSTALL_DIR>/settings64.sh file from the SDAccel installation.

If you are using the GUI, it will automatically incorporate the new XRT location and generate the makefile when you build your project.

However, if you are using your own custom makefile, you need to make the following changes:

  • In your makefile, do not use the XILINX_SDX environment variable which was used in prior releases.
  • The XILINX_SDX variables and paths must be updated to the XILINX_XRT environment variable:
    • Include directories are now specified as: -I${XILINX_XRT}/include and -I${XILINX_XRT}/include/CL
    • Library path is now: -L${XILINX_XRT}/lib
    • OpenCL™ library will be: libxilinxopencl.so. So, use -lxilinxopencl in your makefile

Release Migration

After migrating the host code, build the code on the existing target platform using the new release of the SDAccel development environment. Verify that you can run the project in the SDAccel environment using the new release, and make sure it completes successfully, and meets the timing requirements.

Issues which can occur when using a new release are:

  • Changes to C libraries or library files.
  • Changes to kernel path names.
  • Changes to the HLS pragmas or pragma options embedded in the kernel code.
  • Changes to C/C++/OpenCL compiler support.
  • Changes to the performance of kernels: this may require adjustments to the pragmas in the existing kernel code.

Address these issues using the same techniques you would use during the development of any kernel. At this stage, ensure the throughput performance of the target platform using the new release meets your requirements. If there are changes to the final timing (the maximum clock frequency), you can address these when you have moved to the new target platform. This is covered in Address Timing.

Modifying Kernel Placement

The primary issue when targeting a new platform is ensuring that an existing kernel placement will work in the new target platform. Each target platform has an FPGA defined by a shell. As shown in the figure below, the shell(s) can be different.

  • The shell of the original platform on the left has four SLRs, and the static region is spread across all four SLRs.
  • The shell of the target platform on the right has only three SLRs, and the static region is fully-contained in SLR1.

Figure: Comparison of Shells of the Hardware Platform



This section explains how to modify the placement of the kernels.

Implications of a New Hardware Platform

The figure below highlights the issue of kernel placement when migrating to a new target platform, or shell. In the example below:

  • Existing kernel, kernel_B, is too large to fit into SLR2 of the new target platform because most of the SLR is consumed by the static region.
  • The existing kernel, kernel_D, must be relocated to a new SLR because the new target platform does not have four SLRs like the existing platform.

Figure: Migrating Platforms – Kernel Placement



When migrating to a new platform, you need to take the following actions:

  • Understand the resources available in each SLR of the new target platform, as documented in the SDx Environments Release Notes, Installation, and Licensing Guide.
  • Understand the resources required by each kernel in the design.
  • Use the xocc linker options (--slr and --sp) to specify which SLR each kernel is placed in, and which DDR bank each kernel connects to.

These items are addressed in the remainder of this section.

Determining Where to Place the Kernels

To determine where to place kernels, two pieces of information are required:

  • Resources available in each SLR of the shell of the hardware platform (.dsa).
  • Resources required for each kernel.

With these two pieces of information you will then determine which kernel or kernels can be placed in each SLR of the shell.

Keep in mind when performing these calculation that 10% of the available resources can be used by system infrastructure:

  • Infrastructure logic can be used to connect a kernel to a DDR interface if it has to cross an SLR boundary.
  • In an FPGA, resources are also used for signal routing. It is never possible to use 100% of all available resources in an FPGA because signal routing also requires resources.

Available SLR Resources

The resources available in each SLR provided by Xilinx can be found in the SDx Environments Release Notes, Installation, and Licensing Guide. The figure below shows an example shell. In this example you can see:

  • The SLR description indicates which SLR contains static and/or dynamic regions.
  • The resources available in each SLR (LUTs, Registers, RAM, etc.) are listed.

This allows you to determine what resources are available in each SLR.

Table 1. SLR Resources of a Hardware Platform
Area SLR 0 SLR 1 SLR 2
SLR description Bottom of device; dedicated to dynamic region. Middle of device; shared by dynamic and static region resources. Top of device; dedicated to dynamic region.
Dynamic region pblock name pfa_top_i_dynamic_region_pblock _dynamic_SLR0 pfa_top_i_dynamic_region_pblock _dynamic_SLR1 pfa_top_i_dynamic_region_pblock _dynamic_SLR2
Compute unit placement syntax set_property CONFIG.SLR_ASSIGNMENTS SLR0[get_bd_cells<cu_name>] set_property CONFIG.SLR_ASSIGNMENTS SLR1[get_bd_cells<cu_name>] set_property CONFIG.SLR_ASSIGNMENTS SLR2[get_bd_cells<cu_name>]
Global memory resources available in dynamic region
Memory channels; system port name bank0 (16 GB DDR4) bank1 (16 GB DDR4, in static region)

bank2 (16 GB DDR4, in dynamic region)

bank3 (16 GB DDR4)
Approximate available fabric resources in dynamic region
CLB LUT 388K 199K 388K
CLB Register 776K 399K 776K
Block RAM Tile 720 420 720
UltraRAM 320 160 320
DSP 2280 1320 2280

Kernel Resources

The resources for each kernel can be obtained from the System Estimate report.

The System Estimate report is available in the Assistant view after either the Hardware Emulation or System run are complete. An example of this report is shown below.

Figure: System Estimate Report



  • FF refers to the CLB Registers noted in the platform resources for each SLR.
  • LUT refers to the CLB LUTs noted in the platform resources for each SLR.
  • DSP refers to the DSPs noted in the platform resources for each SLR.
  • BRAM refers to the block RAM Tile noted in the platform resources for each SLR.

This information can help you determine the proper SLR assignments for each kernel.

Assigning Kernels to SLRs

Each kernel in a design can be assigned to a SLR region using the xocc --slr command line option to specify a placement file. When placing kernels, it is recommended to also assign the specific DDR memory bank that the kernel will connect to using the xocc --sp command line option. An example can be used to demonstrate these two command line options.

The figure below shows an example where the existing target platform shell has four SLRs, and the new target platform has a shell with three SLRs, and the static region is also structured differently between the target platforms. In this migration example:

  • Kernel_A is mapped to SLR0.
  • Kernel_B, which no longer fits in SLR1, is remapped to SLR0, where there are available resources.
  • Kernel_C is mapped to SLR2.
  • Kernel_D, is remapped to SLR2, where there are available resources.

The kernel mappings are illustrated in the figure below.

Figure: Mapping of Kernels Across SLRs



Specifying Kernel Placement

For the above example, the kernels are placed using the following xocc command option.
xocc --slr kernel_A:SLR0 \
  --slr kernel_B:SLR0 \
  --slr kernel_C:SLR2 \
  --slr kernel_D:SLR2
With these command line options, each of the kernels is placed as shown in the figure above.

Specifying Kernel DDR Interfaces

You should also specify the kernel DDR memory interface when specifying kernel placements. Specifying the DDR interface ensures the automatic pipelining of kernel connections to a DDR interface in a different SLR. This ensures there is no degradation in timing which can reduce the maximum clock frequency.

In this example, using the kernel placements in the above figure:

  • Kernel_A is connected to Memory Bank 0.
  • Kernel_B is connected to Memory Bank 1.
  • Kernel_C is connected to Memory Bank 2.
  • Kernel_D is connected to Memory Bank 1.
The following xocc command line performs these connections:
xocc --sp kernel_A.arg1:bank0 \
  --sp kernel_B.arg1:bank1 \
  --sp kernel_C.arg1:bank2 \
  --sp kernel_D.arg1:bank1
IMPORTANT: When using the --sp option to assign kernel ports to memory banks, you must specify the --sp option for all interfaces/ports of the kernel. Refer to "Customization of DDR Bank to Kernel Connection" in the SDAccel Environment Programmers Guide for more information.

Address Timing

Perform a system run and if it completes with no violations, then the migration is successful.

If timing has not been met you may need to specify some custom constraints to help meet timing. Refer to UltraFast Design Methodology Guide for the Vivado Design Suite (UG949) for more information on meeting timing.

Custom Constraints

Custom constraints are passed to the Vivado® tools using the xocc -xp option for custom placement and timing constraints. Custom Tcl constraints for floorplanning of the kernels will need to be reviewed in the context of the new target platform (.dsa). For example, if a kernel was moved to a different SLR in the new shell, the corresponding placement constraints for that kernel will also need to be modified.

In general, timing is expected to be comparable between different target platforms that are based on the 9P Virtex UltraScale device. Any custom Tcl constraints for timing closure will need to be evaluated and might need to be modified for the new platform.

Additionally, any non-default options that are passed to xocc or to the Vivado tools using the xocc --xp switch will need to be updated for the new shell.

Timing Closure Considerations

Design performance and timing closure can vary when moving across SDx™ releases or shell(s), especially when one of the following conditions is true:

  • Floorplan constraints were needed to close timing.
  • Device or SLR resource utilization was higher than the typical guideline:
    • LUT utilization was higher than 70%
    • DSP, RAMB, and UltraRAM utilization was higher than 80%
    • FD utilization was higher than 50%
  • High effort compilation strategies were needed to close timing.
The utilization guidelines provide a threshold above which the compilation of the design can take longer, or performance can be lower than initially estimated. For larger designs which usually require using more than one SLR, specify the kernel/DDR association on the xocc command line while verifying that any floorplan constraint ensures the following:
  • The utilization of each SLR is below the recommended guidelines.
  • The utilization is balanced across SLRs if one type of hardware resource needs to be higher than the guideline.

For designs with overall high utilization, increasing the amount of pipelining in the kernels, at the cost of higher latency, can greatly help timing closure and achieving higher performance.

For quickly reviewing all aspects listed above, use the fail-fast reports generated throughout the SDx flow when using one of the following two options:
  • xocc –R 1
    • report_failfast is run at the end of each kernel synthesis step
    • report_fafailst is run after opt_design on the entire design
    • opt_design DCP is saved
  • xocc –R 2
    • Same reports as with -R 1, plus:
    • report_failfast is post-placement for each SLR
    • Additional reports and intermediate DCPs are generated

All reports and DCPs can be found in the implementation directory, including kernel synthesis reports:

<runDir>/_x/link/vivado/prj/prj.runs/impl_1

For more information about timing closure and the fail-fast report, see the UltraFast Design Methodology Timing Closure Quick Reference Guide (UG1292).