SDAccel Development Environment What's New for 2017.4

The following SDAccel™ Development Environment updates are included in this release.

5.x DSAs and features

Table 1. xilinx_vcu1525_dynamic_5_0
Area SLR 0 SLR 1 SLR 2
General information
SLR description Bottom of device; dedicated to dynamic region. Middle of device; shared by dynamic and static region resources. Top of device; dedicated to dynamic region.
Dynamic region pblock name

pfm_top_i_dynamic_region_

pblock_dynamic_SLR0

pfm_top_i_dynamic_region_

pblock_dynamic_SLR1

pfm_top_i_dynamic_region_

pblock_dynamic_SLR2

Compute unit placement syntax1 set_property CONFIG.SLR_ASSIGNMENTS SLR0 [get_bd_cells <cu_name>] set_property CONFIG.SLR_ASSIGNMENTS SLR1 [get_bd_cells <cu_name>] set_property CONFIG.SLR_ASSIGNMENTS SLR2 [get_bd_cells <cu_name>]
Global memory resources available in dynamic region2
Memory channels; system port name bank0 (16GB DDR4)

bank1 (16GB DDR4, in static region)

bank2 (16GB DDR4, in dynamic region)

bank3 (16GB DDR4)
Approximate available fabric resources in dynamic region
CLB LUT 388K 199K 388K
CLB Register 776K 399K 776K
Block RAM Tile 720 420 720
URAM 320 160 320
DSP 2280 1320 2280
  1. Dynamic platforms will by default place a kernel in the same SLR as the memory bank that it accesses. Details on how this may be controlled are provided in the User-specified SLR assignments for Kernels section of the SDAccel Environment User Guide.
  2. Approximately 20K CLB LUTs and 20K CLB Registers are required for each mapped memory channel (except for bank1, in static region). A minimum of 12K CLB LUTs and 18K CLB Registers are also required for the SmartConnect network, with additional resources required for each mapped memory channel and each compute unit.
Table 2. xilinx_kcu1500_dynamic_5_0
Area SLR 0 SLR 1
SLR description Bottom of device; shared by dynamic and static region resources. Top of device; dedicated to dynamic region.
Dynamic region pblock name

pfm_top_i_dynamic_region_

pblock_dynamic_SLR0

pfm_top_i_dynamic_region_

pblock_dynamic_SLR1

Compute unit placement syntax1 set_property CONFIG.SLR_ASSIGNMENTS SLR0 [get_bd_cells <cu_name>] set_property CONFIG.SLR_ASSIGNMENTS SLR1 [get_bd_cells <cu_name>]
Memory channels; system port name

bank0 (4GB DDR4)

bank1 (4GB DDR4)

bank2 (4GB DDR4)

bank3 (4GB DDR4)

CLB LUT 264K 325K
CLB Register 529K 651K
Block RAM Tile 876 1080
DSP 2217 2760
  1. Dynamic platforms will by default place a kernel in the same SLR as the memory bank that it accesses. Details on how this may be controlled are provided in the User-specified SLR assignments for Kernels section of the SDAccel Environment User Guide.
  2. Approximately 20K CLB LUTs and 20K CLB Registers are required for each mapped memory channel. A minimum of 12K CLB LUTs and 18K CLB Registers are also required for the SmartConnect network, with additional resources required for each mapped memory channel and each compute unit.

The 2017.4 release supports 4.X DSA as backward compatibility with the xocc interfaces the same as 2017.2. For 5.X DSA, please use the xocc option changes as provided in the SDAccel Migration Summary.

You need to repackage any existing RTL kernel using 2017.4 irrespective of DSA for running in 2017.4.

Starting 2018.2, xocc will only support 5.X DSA (and the corresponding xocc options). 4.X DSA will be deprecated in 2018.2.

SDx™ GUI

  • The Vivado® IDE may be launched directly from the SDx GUI
    • This allows experienced hardware designers, or those familiar with the Vivado Design Environment, to perform implementation changes to the hardware (detailed timing closure etc.) and save the results.

      In addition, a pre-synthesized or a pre-implemented Vivado Design Checkpoint (.dcp) file can be brought in from the launched Vivado session, and directly used in the SDx session to complete the remaining flow without having to start from the beginning.

      Changes made during the Vivado session are also automatically captured in SDx for subsequent runs.

  • RTL Kernel Wizard has been enhanced to support additional types of packaging options, including pre-compiled kernels and netlist (.dcp) based kernels.
  • Dataflow is an important feature in xocc (for both C/C++ as well as OpenCL kernels). Several DRC (with extended documentation) related to C/C++/OpenCL designs with dataflow have been added.
  • DRC window highlighting key DRC in the user C/C++/OpenCL code is available at the bottom of the SDx GUI next to the Console tab.

Kernel Performance Enhancements

  • Improved data transfer rates are now provided through the following means.
    • Automatic memory coalescing and widening. The automatic widening of the data transfer may be disabled by adding the nounroll pragma to for-loops
    • Manually specified memory coalescing using the new Xilinx OpenCL attribute xcl_zero_global_work_offset, which may be used when clEnqueueNDRangeKernel is used without global_work_offset.
    • Xilinx highly recommends that you use a correctly specified global_work_offset.
  • The work group size is now automatically inferred based on OpenCL semantics.

Whole-function vectorization is provided through the vec_type_hint attribute on NDRange.

  • It is highly recommended to use vec_type_hint for improved performance.
  • Whole-function vectorization may increase the size of the hardware implementation.

Sub-functions are now automatically inlined to improve performance

This may be disabled using the noinline pragma and OpenCL attribute.

Xilinx SDAccel Runtime

  • Support is now provided for the OpenCL API clCreateSubDevices.
    • Sub-devices may be created for each Compute Unit (CU) allowing for multiple independent command queues for each CU.
    • Each sub-device may include only one CU.
  • Improved Linux Driver support
    • Drivers now use the Linux DMA_BUF framework allowing data sharing across all Linux devices.
    • Enables the exporting of device data (temp, current, etc.) via Linux SysFS framework.

RTL Kernel Enhancements

  • Support for compile time parameterization of RTL kernels via the xocc command line.
  • A new xocc command line option allows a single RTL Kernel (.xo file) to be instantiated as multiple kernel instances. In addition, these separate instances can be queued independently from each other.
  • RTL kernels can now be pre-compiled, to reduce SDx compile flow time by not having to do synthesis under SDx.
  • RTL Kernels may be created from a Xilinx checkpoint (.dcp) file.
  • Encryption support is now provided for RTL Kernels.

XOCC Enhancements

  • --ini_file switch can be used to pass a set of advanced --xp style switches to xocc using a single file (similar to use of xocc.ini file).

  • --report_dir switch allows report files generated under SDx runs to be copied to a separate directory for easy access.

  • --log_dir switch allows log files generated under SDx runs to be copied to a separate directory for easy access.

  • --temp_dir switch allows a user specified directory to be used for generation of temporary files.

  • --interactive switch allows Vivado to be launched from within the xocc environment, with the right project loaded.

  • --reuse_synth switch allows a pre-synthesized Vivado Design Checkpoint (.dcp) file to be brought in and used directly in SDx flow to complete implementation and xclbin generation.

  • --reuse_impl switch allows a pre-implemented and timing closed Vivado Design Checkpoint (.dcp) file to be brought in and used directly in SDx flow to do xclbin generation.

  • --remote_ip_cache switch allows usage of a user specified IP cache location. This will improve iterative SDx flow run times.

  • --user_ip_repo_paths switch allows usage of additional read only IP cache locations, as well as custom IP definitions to be used in SDx.

  • --no_ip_cache switch can be used to turn off all usages of IP caches. This is generally not recommended other than debugging purposes.

Profile Features

  • Profile instrumentation of kernels is now enabled through xocc compile option –profile_kernel.
  • Profile reports generated using sdx_analyze utility.
  • Profile Summary Report Enhancements:
    • Data Transfer table now displays information on a compute unit/port basis including kernel arguments and DDR bank.
    • Compute Unit Table now reports clock frequency per Compute Unit.

Debug Features

  • Kernel Debug is now supported through GDB and TCF in Hardware Emulation. This provides the ability to:
    • Start and stop at intermediate points in the execution of the kernels.
    • Inspect both kernel arguments and global memory.
  • Application Debug: The following new gdb extensions provide enhanced debug information:
    • xprint kernel: Displays all NDRange events that are pending and their arguments
    • xprint all: Displays all valid OpenCL objects
    • xstatus all: Provides visibility into the IPs instantiated on the platform
    • A new xocc command option –dk to insert a Light Weight Protocol Checker IP into the system to debug AXI protocol violations.

Emulation Features

  • XCL_EMULATION_MODE: The XCL_EMULATION_MODE environment variable now requires a value of sw_emu or hw_emu. Setting XCL_EMULATION_MODE environment variable to sw_emu changes the application execution to software emulation mode, hw_emu enables hardware emulation mode. Unset the XCL_EMULATION_MODE variable to disable emulation.
  • Memory checks on out-bounds-accesses and invalid read or write operations (writing to a read only device or vice versa) are provided during Software Emulation.