Programming Model

The Vitis™ core development kit supports heterogeneous computing using the industry standard OpenCL™ framework (https://www.khronos.org/opencl/). The host program executes on the processor (x86 or Arm®) and offloads compute intensive tasks through Xilinx Runtime (XRT) to execute on a hardware kernel running on programmable logic (PL) using the OpenCL programming paradigm.

Device Topology

In the Vitis core development kit, targeted devices can include Xilinx® MPSoCs or UltraScale+™ FPGAs connected to a processor, such as an x86 host through a PCIe bus, or an Arm processor through an AXI4 interface. The FPGA contains a programmable region that implements and executes hardware kernels.

The FPGA platform contains one or more global memory banks. The data transfer from the host machine to kernels and from kernels to the host happens through these global memory banks. The kernels running in the FPGA can have one or more memory interfaces. The connection from the global memory banks to those memory interfaces are configurable, as their features are determined by the kernel compilation options.

Multiple kernels can be implemented in the PL of the Xilinx device, allowing for significant application acceleration. A single kernel can also be instantiated multiple times. The number of instances of a kernel is programmable, and determined by linking options specified when building the FPGA binary. For more information on specifying these options, refer to Linking the Kernels.

Kernel Properties

In the Vitis application acceleration development flow, kernels are the processing elements executing in the PL region of the Xilinx device. The Vitis software platform supports kernels written in C/C++, RTL, or OpenCL C. Regardless of source language, all kernels have the same properties and must adhere to same set of requirements. This is what allows the system compiler linker and Xilinx Runtime (XRT) to seamlessly interact with the kernels.

This topic describes the properties and requirements of kernels in the Vitis application acceleration flow. The following topics discuss how these requirements are satisfied based on the specific source languages:

Kernel Execution Modes

There are three types of kernel execution modes, as described in the following table. These modes are mutually exclusive; each kernel can only operate in one of these modes. Kernels with different execution modes can be linked together by the Vitis linker to form the FPGA binary.

Sequential Mode Pipelined Mode Free-Running Mode
  • A kernel is started by the host application using an API call.
  • Once the kernel is done, it notifies the host application.
  • The kernel can only restart once current task is completed.
  • Legacy mode for kernels using memory-based data transfers.
  • A kernel is started by the host application using an API call.
  • Once the kernel is ready for new data, it notifies the host application.
  • The kernel can be restarted before its current task is completed.
  • Improves performance as multiple invocations of kernel can be overlapped.
  • Default mode for kernels using memory-based data transfers.
  • A kernel starts as soon as the device is programmed with the xclbin.
  • A kernel is running continuously and synchronizes on availability of data.
  • Free-running mode is not supported for kernels described in OpenCL C.

Kernel Interfaces

Kernel interfaces are used to exchange data with the host application, other kernels, or device I/Os. There are three types of interfaces allowed, each designed for a particular kind of data transfer. It is common for kernels to have multiple interfaces of different types.

Functional Properties

The following table describes the functional properties of kernel interfaces.

Table 1. Functional Properties
Register Memory Mapped Streaming
  • Designed for transferring scalars between the host application and the kernel.
  • Register reads and writes are initiated by the host application.
  • The kernel acts as a slave.
  • Designed for bi-directional data transfers with global memory (DDR, PLRAM, HBM)
  • Access pattern is usually random.
  • Introduces additional latency for memory transfers
  • The kernel acts as a master accessing data stored into global memory
  • Base address of data is sent via the Register interface
  • The host application allocates the buffer for the size of the dataset.
  • Free-running kernels cannot have memory mapped interfaces.
  • Designed for uni-directional data transfers from between kernels or between the host application and kernels.
  • Access pattern is sequential.
  • Does not use global memory.
  • Better performance than memory-mapped transfers.
  • Data set is unbounded.
  • Sideband signal can be used to indicate the last value in the stream.

Implementation Requirements

Each interface type must be implemented using specific hardware protocols. This is what allows the system compiler linker to integrate and compose kernels together with the platform. The following table describes the requirements for mapping interfaces to hardware.

Table 2. Implementation Requirements
Register Memory Mapped Streaming
  • Register interfaces must be implemented using an AXI4-Lite interface.
  • Kernel can have no more than one AXI4-Lite interface.
  • Memory mapped interfaces must be implemented using AXI4 Masters.
  • Kernels can have one or more AXI4 Master interfaces.
  • Different memory-mapped arguments can be transferred through the same AXI4 Master.
  • Streaming interfaces must be implemented using AXI4-Stream interfaces
  • Kernel can have one or more AXI4-Stream interfaces.

Clock and Reset Requirements

C/C++/OpenCL C Kernel RTL Kernel
C kernel does not require any input from user on clock ports and reset ports. The HLS tool will always generate RTL with clock port ap_clk and reset port ap_rst_n.
  • Requires a clock port. Must be named ap_clk.
  • Optional clock port. Must be named ap_clk_2.
  • Optional reset port. Must be named ap_rst_n. This signal is driven by the synchronous reset in the ap_clk clock domain.
  • This reset signal is active-Low.
  • Another optional reset port. Must be named ap_rst_n_2. This signal is driven by synchronous reset in the ap_clk_2 clock domain.