RTL Kernels

As mentioned in FPGA Binary Build Process, each hardware kernel in the Vitis core development kit is independently compiled to a Xilinx object (.xo) file. These files can be combined into application projects for linking into the FPGA executable (xclbin). This includes the ability to package existing RTL IP from the Vivado Design Suite for use in the Vitis application acceleration development flow.

Many hardware engineers have existing RTL IP (including Vivado® IP integrator based designs), or prefer implementing a kernel in RTL and developing it using the Vivado tools. While the Vitis core development kit supports the use of packaged RTL designs, they must adhere to the software and hardware requirements to be used within the accelerated application development flow and runtime library.

Requirements of an RTL Kernel

An RTL design must meet both interface and software requirements to be used as an RTL kernel within the Vitis IDE.

It might be necessary to add or modify the original RTL design to meet these requirements, which are outlined in the following sections.

Kernel Interface Requirements

To satisfy the Vitis core development kit execution model, an RTL kernel must adhere to the requirements described in Kernel Properties. The RTL kernel must have at least one clock interface port to supply a clock to the kernel logic. The various interface requirements are summarized in the following table.

IMPORTANT: In some cases, the port names must be written exactly as shown.

Table 1. RTL Kernel Interface and Port Requirements
Port or Interface	Description	Comment
ap_clk	Primary clock input port	Name must be exact. Required port.
ap_clk_2	Secondary optional clock input port	Name must be exact. Optional port.
ap_rst_n	Primary active-Low reset input port	Name must be exact. Optional port. This signal should be internally pipelined to improve timing. This signal is driven by a synchronous reset in the ap_clk clock domain.
ap_rst_n_2	secondary optional active-Low reset input	Name must be exact. Optional port. This signal should be internally pipelined to improve timing. This signal is driven by a synchronous reset in the ap_clk_2 clock domain.
interrupt	Active-High interrupt.	Name must be exact. Optional port.
s_axi_control	One (and only one) AXI4-Lite slave control interface	Name must be exact; case sensitive. Required port.
AXI4_MASTER	One or more AXI4 master interfaces for global memory access	All AXI4 master interfaces must have 64-bit addresses. The RTL kernel developer is responsible for partitioning global memory spaces. Each partition in the global memory becomes a kernel argument. The memory offset for each partition must be set by a control register programmable via the AXI4-Lite slave interface. AXI4 masters must not use Wrap or Fixed burst types and must not use narrow (sub-size) bursts meaning AxSIZE should match the width of the AXI data bus. Any user logic or RTL code that does not conform to the requirements above, must be wrapped or bridged to satisfy these requirements.

Kernel Software Requirements

RTL kernels have the same software interface model as C/C++ and OpenCL kernels. They are seen by the host program as functions with a void return value, pointer arguments, and scalar arguments.

The Vitis core development kit execution model dictates the following:

Scalar arguments are directly written to the kernel through the AXI4-Lite slave interface.
Pointer arguments are transferred from the host program to/from memory, and the RTL kernel reads/writes the data in memory through one or more AXI4 memory mapped interfaces.
Kernels are controlled by the host program through the control register (shown below) through the AXI4-Lite slave interface.

If the RTL design has a different execution model, it must be adapted to ensure that it will operate in this manner.

The following table outlines the required register map such that a kernel can be used within the Vitis IDE. The control register is required by all kernels while the interrupt related registers are only required for designs with interrupts. All user-defined registers must begin at location 0x10; locations below this are reserved.

Table 2. Address Map
Address	Name	Description
0x0	Control	Controls and provides kernel status.
0x4	Global Interrupt Enable	Used to enable interrupt to the host.
0x8	IP Interrupt Enable	Used to control which IP generated signal are used to generate an interrupt.
0xC	IP Interrupt Status	Provides interrupt status.
0x10	Kernel arguments	This would include scalars and global memory arguments for instance.

Table 3. Control (0x0)
Bit	Name	Description
0	`ap_start`	Asserted when kernel can start processing data. Cleared on handshake with `ap_done` being asserted.
1	`ap_done`	Asserted when kernel has completed operation. Cleared on read.
2	`ap_idle`	Asserted when kernel is idle.
31:3	Reserved	Reserved

Note: The host typically writes to 0x00000001 to the offset 0 control register which sets Bit 0, clears Bits 1 and 2, and polls on reading done signal until it is a 1.

The following interrupt related registers are only required if the kernel has an interrupt.

Table 4. Global Interrupt Enable (0x4)
Bit	Name	Description
0	Global Interrupt Enable	When asserted, along with the IP Interrupt Enable bit, the interrupt is enabled.
31:1	Reserved	Reserved

Table 5. IP Interrupt Enable (0x8)
Bit	Name	Description
0	Interrupt Enable	When asserted, along with the Global Interrupt Enable bit, the interrupt is enabled.
31:1	Reserved	Reserved

Table 6. IP Interrupt Status (0xC)
Bit	Name	Description
0	Interrupt Status	Toggle on write.
31:1	Reserved	Reserved

Interrupt

RTL kernels can optionally have an interrupt port containing a single interrupt. The port name must be called interrupt and be active-High. It is enabled when both the global interrupt enable (GIE) and interrupt enable register (IER) bits are asserted.

By default, the IER uses the internal ap_done signal to trigger an interrupt. Further, the interrupt is cleared only when writing a one to bit-0 of the IP Interrupt Status Register.

If adding an interrupt port to the RTL kernel, the kernel.xml file needs to include this information. The kernel.xml, located in the kernel.xo file, is generated automatically when using the package_xo command, or RTL Kernel Wizard. By default, the kernel uses a single interrupt port, interrupt, along with the interrupt logic in the Control Register block. This is reflected in the generated Verilog code for the RTL kernel, and the associated component.xml and kernel.xml files.

RTL Kernel Development Flow

This section explains the two-step process for creating RTL kernels for the Vitis core development kit, which includes:

Package the RTL block as a standard Vivado IP.
Package the RTL kernel into a Xilinx Object (.xo) file.

A packaged RTL kernel is delivered as an.xo file extension. This file is a container encapsulating the Vivado IP object (including source files) and associated kernel XML file. The .xo file can be combined with other kernels, and linked with the target platform and built for hardware or hardware emulation flows.

TIP: An RTL kernel is not suited for software emulation unless you provide a C-model for the kernel.

Package the RTL Code as a Vivado IP

RTL kernels must be packaged as a Vivado IP that can be used with the IP integrator. For details on IP packaging in the Vivado tool, see the Vivado Design Suite User Guide: Creating and Packaging Custom IP (UG1118).

The following required interfaces for the RTL kernel must be packaged:

The AXI4-Lite interface name must be packaged as S_AXI_CONTROL, but the underlying AXI ports can be named differently.
The AXI4 interfaces must be packaged as AXI4 master endpoints with 64-bit address support.
Note: Xilinx strongly recommends that AXI4 interfaces be packaged with AXI meta data HAS_BURST=0 and SUPPORTS_NARROW_BURST=0. These properties can be set in an IP-level bd.tcl file. This indicates wrap and fixed burst type is not used, and narrow (sub-size burst) is not used.
ap_clk and ap_clk_2 must be packaged as clock interfaces (ap_clk_2 is only required when the RTL kernel has two clocks).
ap_rst_n and ap_rst_n_2 must be packaged as active-Low reset interfaces (when the RTL kernel has a reset).
ap_clk must be associated with all AXI4-Lite, AXI4, and AXI4-Stream interfaces, if used, the ap_rst_n signal.

To package the IP, use the following steps:

Create and package a new IP.
1. From a Vivado project, with your RTL source files added, select Tools > Create and Package New IP.
2. Select Package your current project, and click Next.
  You can select the default location for your IP, or choose a different location.
3. To open the Package IP window, select Finish.
Associate the clock to the AXI interfaces.
In the Ports and Interfaces section of the Package IP window, you can associate the ap_clk with the AXI4 interfaces, and reset signal if needed.
1. Right-click an interface, and select Associate Clocks.
  This opens the Associate Clocks dialog box which lists the ap_clk, and perhaps ap_clk_2.
2. Select the ap_clk and click OK to associate it with the interface.
3. Make sure to repeat this step to associate ap_clk with each of the AXI interfaces, and the reset.
Add FREQ_HZ to ap_clk.
1. In the Ports and Interfaces section, right-click the ap_clk port and select Edit Interface to open the Edit Interface dialog box as shown in the following figure.
2. Select the FREQ_HZ parameter on the left-side of the dialog box, as shown, and select the arrow (→) to move it from left to right.
3. You can also define the value for the FREQ_HZ parameter by scrolling the right side of the dialog box, and entering 250000000 in the Value field, for example, because the parameter is specified in Hz.
4. Click OK to add the parameter.
5. The RTL kernel also requires the value_resolve_type property on the FREQ_HZ parameter to define how the tool should resolve value conflicts. You must specify a value of user for the property using the following Tcl command:
```
set_property value_resolve_type user [ipx::get_bus_parameters -of [::ipx::get_bus_interfaces -of [ipx::current_core] *clk*] "
```

Add the control registers and offsets.

The kernel requires control registers as discussed in Kernel Software Requirements. The following table shows a list of the required registers.

Table 7. Address Map
Register Name	Description	Address Offset	Size
CTRL	Control Signals. IMPORTANT: The CTRL register and <kernel_args> are required on all kernels. The interrupt related registers are only required for designs with interrupts.	0x000	32
GIER	Global Interrupt Enable Register. Used to enable interrupt to the host.	0x004	32
IP_IER	IP Interrupt Enable Register. Used to control which IP generated signal are used to generate an interrupt.	0x008	32
IP_ISR	IP Interrupt Status Register. Provides interrupt status.	0x00C	32
<kernel_args>	This includes a separate entry for each kernel argument as needed on the software function interface. All user-defined registers must begin at location `0x10`; locations below this are reserved.	0x010	32/64 Scalar arguments are 32.bits wide. `m_axi` interfaces are 64 bits wide.

To create the address map described in the table, select the Addressing and Memory section of the Package IP window. Right-click in the Address Blocks and select the Add Register command.
This opens the Add Register dialog box in which you can enter one of the register names from the table above.
Repeat as needed to add all required registers.
This creates a Registers table in the Addressing and Memory section. You can edit the table to add the Description, Address Offset, and Size to each register. The Registers table should look similar to the following example.
Finally, select the register for each of the pointer arguments from your table, right-click and select the Add Register Parameter command. Enter the name ASSOCIATED_BUSIF into the dialog box that opens, and click OK.
This lets you define an association between the register and the AXI4 Interface. In the value field of the added parameter, enter the name of the m_axi interface assigned to the specific argument you are defining. In the example above, the argument A uses the m00_axi interface, and the argument B uses the m01_axi interface.

Add required properties to the IP:
The IP requires a few standard properties that you can add to your core. The easiest way to do this is by using the following commands from the Vivado Tcl Console.
```
set core [ipx::current_core]
set_property xpm_libraries {XPM_CDC XPM_MEMORY XPM_FIFO} $core
set_property sdx_kernel true $core
set_property sdx_kernel_type rtl $core
```
At this point you are ready to package your IP.
1. Select the Review and Package section of the Package IP window, review the Summary and After Packaging sections, and make whatever changes are needed.
  IMPORTANT: You must enable the generation of an IP archive file. If the After Packaging section indicates An archive will not be generated. you must select the Edit packaging settings link and enable the Create archive of IP setting.
2. When you are ready, click Package IP.
  The Vivado tool packages your kernel IP and opens a dialog box to inform you of success. You can go on to package the kernel using the package_xo command, as described in Creating the .xo File from the RTL Kernel.
To test if the RTL kernel is packaged correctly for the IP integrator, try to instantiate the packaged kernel IP into a block design in the IP integrator. For information on the tool, refer to Vivado Design Suite User Guide: Designing IP Subsystems Using IP Integrator (UG994).
The kernel IP should show the various interfaces described above. Examine the IP in the canvas view. The properties of the AXI interface can be viewed by selecting the interface on the canvas. Then in theBlock Interface Properties window, select the Properties tab and expand the CONFIG table entry. If an interface is to be read-only or write-only, the unused AXI channels can be removed and the READ_WRITE_MODE is set to read-only or write-only.
If the RTL kernel has constraints which refer to constraints in the static area such as clocks, then the RTL kernel constraint file needs to be marked as late processing order to ensure RTL kernel constraints are correctly applied.
There are two methods to mark constraints as late processing order:
1. If the constraints are given in a .ttcl file, add <: setFileProcessingOrder "late" :> to the .ttcl preamble section of the file as follows:
```
<: set ComponentName [getComponentNameString] :>
<: setOutputDirectory "./" :>
<: setFileName $ComponentName :>
<: setFileExtension ".xdc" :>
<: setFileProcessingOrder "late" :>
```
2. If constraints are defined in an .xdc file, then add the following four lines starting at <spirit:define> in the component.xml. The four lines in the component.xml need to be next to the area where the .xdc file is called. In the following example, my_ip_constraint.xdc file is being called with the subsequent late processing order defined.
```
<spirit:file>
        <spirit:name>ttcl/my_ip_constraint.xdc</spirit:name>
        <spirit:userFileType>ttcl</spirit:userFileType>
        <spirit:userFileType>USED_IN_implementation</spirit:userFileType>
        <spirit:userFileType>USED_IN_synthesis</spirit:userFileType>
        <spirit:define>
             <spirit:name>processing_order</spirit:name>
             <spirit:value>late</spirit:value>
        </spirit:define>
</spirit:file>
```

Creating the .xo File from the RTL Kernel

The final step is to package the RTL IP into a Xilinx object file (.xo), so the kernel can be used in the Vitis core development kit. This is done using the package_xo Tcl command in the Vivado Design Suite.

The package_xo command uses the component.xml file from the IP to create the necessary kernel.xml if possible. The Vivado tool runs design rule checks as a pre-processor for package_xo to determine that everything is available and either processes the IP to create the .xo file, or returns errors indicating any problems that might exist.

The following example packages an RTL kernel IP named test_sincos into an object file named test.xo. After packaging the IP, the package_xo command is run from within the Vivado tool.

package_xo -xo_path ./test.xo -kernel_name test_sincos -ip_directory ./ip/

The output of the package_xo command is the test.xo file, that can be added as a source file to the v++ --link command as discussed in Building and Running the Application, or added to an application project as discussed in Using the Vitis IDE.

In some cases, you might find it necessary to provide a kernel.xml file for your IP, as specified in the requirements described in Creating the Kernel Description XML File. You can use the -kernel_xml option to specify the file for the package_xo command. In this case, the package_xo command uses the kernel.xml as specified. The following example shows this command.

package_xo -xo_path ./export/test.xo -kernel_name test_sincos \
-kernel_xml ./src/kernel.xml -ip_directory ./ip/

Creating the Kernel Description XML File

TIP: The package_xo command will create a kernel.xml file from the component.xml of a packaged IP, so you do not need to manually provide one, or generate one using the RTL Kernel wizard.

An XML kernel description file, called kernel.xml, must be created for each RTL kernel, so that it can be used in the Vitis application acceleration development flow. The kernel.xml file specifies kernel attributes like the register map and ports needed by the runtime and Vitis tool flows. The following code shows is an example of a kernel.xml file.

<?xml version="1.0" encoding="UTF-8"?>
<root versionMajor="1" versionMinor="6">
  <kernel name="vitis_kernel_wizard_0" language="ip_c" 
      vlnv="mycompany.com:kernel:vitis_kernel_wizard_0:1.0" 
      attributes="" preferredWorkGroupSizeMultiple="0" workGroupSize="1" interrupt="true">
    <ports>
      <port name="s_axi_control" mode="slave" range="0x1000" dataWidth="32" portType="addressable" base="0x0"/>
      <port name="m00_axi" mode="master" range="0xFFFFFFFFFFFFFFFF" dataWidth="512" portType="addressable" 
         base="0x0"/>
    </ports>
    <args>
      <arg name="axi00_ptr0" addressQualifier="1" id="0" port="m00_axi" size="0x8" offset="0x010" type="int*" 
         hostOffset="0x0" hostSize="0x8"/> 
    </args>
  </kernel>
</root>

Note: The kernel.xml file can be created automatically using the RTL Kernel Wizard to specify the interface specification of your RTL kernel. For more information, refer to RTL Kernel Wizard.

The following table describes the format of the kernel.xml in detail:

Table 8. Kernel XML File Content
Tag	Attribute	Description
<root>	versionMajor	For the current release of Vitis software platform, set to 1.
<root>	versionMinor	For the current release of Vitis software platform, set to 6.
<kernel>	name	Kernel name
	language	Always set to `ip_c` for RTL kernels.
	vlnv	Must match the vendor, library, name, and version attributes in the component.xml of an IP. For example, if component.xml has the following tags: `<spirit:vendor>xilinx.com</spirit:vendor>` `<spirit:library>hls</spirit:library>` `<spirit:name>test_sincos</spirit:name>` `<spirit:version>1.0</spirit:version>` The vlnv attribute in kernel XML must be set to:`xilinx.com:hls:test_sincos:1.0`
	attributes	Reserved. Set it to empty string: ""
	preferredWorkGroupSizeMultiple	Reserved. Set it to 0.
	workGroupSize	Reserved. Set it to 1.
	interrupt	Set to "true" (interrupt="true") if the RTL kernel has an interrupt, otherwise omit.
	hwControlProtocol	Specifies the control protocol for the RTL kernel. `ap_ctrl_hs`: Default control protocol for RTL kernels. `ap_ctrl_chain`: Control protocol for chained kernels that support dataflow. Adds `ap_continue` to the control registers to enable `ap_done`/`ap_continue` completion acknowledgment. `ap_ctrl_none`: Control protocol (none) applied for continuously operating kernels that have no need for start or done. For details, refer to Free-Running Kernel.
<port>	name	Specifies the port name. IMPORTANT: The AXI4-Lite interface must be named S_AXI_CONTROL.
	mode	At least one AXI4 master port and one AXI4-Lite slave control port are required. AXI4-Stream ports can be specified to stream data between kernels. For AXI4 master port, set to "master." For AXI4 slave port, set to "slave." For AXI4-Stream master port, set to "write_only." For AXI4-Stream slave port, set it "read_only."
	range	The range of the address space for the port.
	dataWidth	The width of the data that goes through the port, default is 32-bits.
	portType	Indicate whether or not the port is addressable or streaming. For AXI4 master and slave ports, set it to "addressable." For AXI4-Stream ports, set it to "stream."
	base	For AXI4 master and slave ports, set to `0x0`. This tag is not applicable to AXI4-Stream ports.
<arg>	name	Specifies the kernel software argument name.
	addressQualifier	Valid values: 0: Scalar kernel input argument 1: global memory 2: local memory 3: constant memory 4: pipe
	id	Only applicable for AXI4 master and slave ports. The ID needs to be sequential. It is used to determine the order of kernel arguments. Not applicable for AXI4-Stream ports.
	port	Specifies the <port> name to which the `arg` is connected.
	size	Size of the argument in bytes. The default is 4 bytes.
	offset	Indicates the register memory address.
	type	The C data type of the argument. For example, `uint`, `int`, or `float*`.
	hostOffset	Reserved. Set to `0x0`.
	hostSize	Size of the argument. The default is 4 bytes.
	memSize	For AXI4-Stream ports, `memSize` sets the depth of the created FIFO. TIP: Not applicable to AXI4 ports.
The following tags specify additional tags for AXI4-Stream ports. They do not apply to AXI4 ports.
<pipe>	For each pipe in the compute unit, the compiler inserts a FIFO for buffering the data. The pipe tag describes configuration of the FIFO.
	name	Specifies the name for the FIFO inserted for the AXI4-Stream port. This name must be unique to all pipes used in the same compute unit.
	width	Specifies the width of FIFO in bytes. For example, `0x4` for 32-bit FIFO.
	depth	Specifies the depth of the FIFO in number of words.
	linkage	Always set to internal.
<connection>	The connection tag describes the actual connection in hardware, either from the kernel to the FIFO inserted for the PIPE, or from the FIFO to the kernel.
	srcInst	Specifies the source instance of the connection.
	srcPort	Specifies the port on the source instance for the connection.
	dstInst	Specifies the destination instance of the connection.
	dstPort	Specifies the port on the destination instance of the connection.

RTL Kernel Wizard

The RTL kernel wizard automates some of the steps you need to take to ensure that the RTL IP is packaged into a kernel object (.xo) that can be used by the Vitis compiler. The RTL Kernel wizard:

Steps you through the process of specifying the interface requirements for your RTL kernel, and generates a top-level RTL wrapper based on the provided information.
Automatically generates an AXI4-Lite interface module including the control logic and register file, included in the top level wrapper.
Includes an example kernel IP module in the top-level wrapper that you can replace with your own RTL IP design, after ensuring correct connectivity between your RTL IP and the wrapper.
Automatically generates a kernel.xml file to match the kernel specification from the wizard.
Generates a simple simulation test bench for the generated RTL kernel wrapper.
Generates an example host program to run and debug the RTL kernel.

The RTL Kernel wizard can be accessed from the Vitis IDE, or from the Vivado IP catalog. In either case it creates a Vivado project containing an example design to act as a template for defining your own RTL kernel.

The example design consists of a simple RTL IP adder, called VADD, that you can use to guide you through the process of mapping your own RTL IP into the generated top-level wrapper. The connections include clock(s), reset(s), s_axilite control interface, m_axi interfaces, and optionally axis streaming interfaces.

The Wizard also generates a simple test bench for the generated RTL kernel wrapper, and a sample host code to exercise the example RTL kernel. This example test bench and host code must be modified to test the your RTL IP design accordingly.

Launch the RTL Kernel Wizard

The RTL Kernel Wizard can be launched from the Vitis IDE, or from the Vivado IDE.

TIP: Running the wizard from the Vitis IDE automatically imports the generated RTL kernel, and example host code, into the current application project when the process is complete.

To launch the RTL Kernel Wizard from within the Vitis IDE, select the Xilinx > RTL Kernel Wizard menu item from an open application project. For details on working with the GUI, refer to Using the Vitis IDE.

To launch the RTL Kernel Wizard from the Vivado IDE:

Create a new Vivado project, select the target platform when choosing a board for the project.
In the Flow Navigator, click the IP catalog command.
Type RTL Kernel in the IP catalog search box.
Double-click RTL Kernel Wizard to launch the wizard.

Using the RTL Kernel Wizard

The RTL Kernel wizard is organized into multiple pages that break down the process of defining an RTL kernel. The pages of the wizard include:

General Settings
Scalars
Global Memory
Streaming Interfaces
Summary

To navigate between pages, click Next and Back as needed.

To finalize the kernel and build a project based on the kernel specification, click OK on the Summary page.

General Settings

The following figure shows the three settings in the General Settings tab.

The following are three settings in the General Settings tab.

Kernel Identification

Kernel name: The kernel name. This will be the name of the IP, top-level module name, kernel, and C/C++ functional model. This identifier shall conform to C and Verilog identifier naming rules. It must also conform to Vivado IP integrator naming rules, which prohibits underscores except when placed in between alphanumeric characters.
Kernel vendor: The name of the vendor. Used in the Vendor/Library/Name/Version (VLNV) format described in the Vivado Design Suite User Guide: Designing with IP (UG896).
Kernel library: The name of the library. Used in the VLNV. Must conform to the same identifier rules.

Kernel options

Kernel type

The RTL Kernel wizard currently supports two types of kernels: RTL, and Block Design.

RTL: The RTL type kernel consists of a Verilog RTL top-level module with a Verilog control register module and a Verilog kernel example inside the top-level module.
Block Design: The block design type kernel also delivers a Verilog top-level module, but instead it instantiates an IP integrator block diagram inside of the top-level. The block design consists of a MicroBlaze™ subsystem that uses a block RAM exchange memory to emulate the control registers. Example MicroBlaze software is delivered with the project to demonstrate using the MicroBlaze to control the kernel.

Kernel control interface

There are three types of control interfaces available for the RTL kernel. ap_ctrl_hs, ap_ctrl_chain, and ap_ctrl_none. This defines the hwControlProtocol for the <kernel> tag as described in Creating the Kernel Description XML File.

Clock and Reset Options

Number of clocks: Sets the number of clocks used by the kernel. Every RTL kernel has one primary clock called ap_clk and an optional reset called ap_rst_n. All AXI interfaces on the kernel are driven with this clock.
When setting Number of clocks to 2, a secondary clock and optional reset are provided to be used by the kernel internally. The secondary clock and reset are called ap_clk_2 and ap_rst_n_2. This secondary clock supports independent frequency scaling and is independent from the primary clock. The secondary clock is useful if the kernel clock needs to run at a faster or slower rate than the AXI4 interfaces, which must be clocked on the primary clock.
IMPORTANT: When designing with multiple clocks, proper clock domain crossing techniques must be used to ensure data integrity across all clock frequency scenarios. Refer to UltraFast Design Methodology Guide for the Vivado Design Suite (UG949) for more information.
Has reset: Specifies whether to include a top-level reset input port to the kernel. Omitting a reset can be useful to improve routing congestion of large designs. Any registers that would normally have a reset in the design should have proper initial values to ensure correctness. If enabled, there is a reset port included with each clock. Block Design type kernels must have a reset input.

Scalars

Scalar arguments are used to pass control type information to the kernels. Scalar arguments cannot be read back from the host. For each argument that is specified, a corresponding register is created to facilitate passing the argument from software to hardware. See the following figure.

Number of scalar kernel input arguments: Specifies the number of scalar input arguments to pass to the kernel. For each number specified, a table row is generated that allows customization of the argument name and argument type. There is no required minimum number of scalars and the maximum allowed by the wizard is 64.

The following is the scalar input argument definition:

Argument name: The argument name is used in the generated Verilog control register module as an output signal. Each argument is assigned an ID value. This ID value is used to access the argument from the host software. The ID value assignments can be found on the summary page of this wizard. To ensure maximum compatibility, the argument name follows the same identifier rules as the kernel name.

Argument type: Specifies the data type, and hence bit-width, of the argument. This affects the register width in the generated RTL kernel module. The data types available are limited to the ones specified by the OpenCL C Specification Version 2.0 in "6.1.1 Built-in Scalar Data Types" section. The specification provides the associated bit-widths for each data type. The RTL wizard reserves 64 bits for all scalars in the register map regardless of their argument type. If the argument type is 32 bits or less, the RTL Wizard sets the upper 32 bits (of the 64 bits allocated) as a reserved address location. Data types that represent a bit width greater than 32 bits require two write operations to the control registers.

Global Memory

Global memory is accessed by the kernel through AXI4 master interfaces. Each AXI4 interface operates independently of each other, and each AXI4 interface can be connected to one or more memory controllers to off-chip memory such as DDR4. Global memory is primarily used to pass large data sets to and from the kernel from the host. It can also be used to pass data between kernels. For recommendations on how to design these interfaces for optimal performance, see Memory Performance Optimizations for AXI4 Interface.

TIP: For each interface, the RTL Kernel wizard generates example AXI master logic in the top-level wrapper to provide a starting point that can be discarded if not needed.

Number of AXI master interfaces: Specify the number of interfaces present on the kernel. The maximum is 16 interfaces. For each interface, you can customize an interface name, data width, and the number of associated arguments. Each interface contains all read and write channels. The default names proposed by the RTL kernel wizard are m00_axi and m01_axi. If not changed, these names will have to be used when assigning an interface to global memory as described in Mapping Kernel Ports to Global Memory.

AXI master definition (table columns)

Interface name: Specifies the name of the interface. To ensure maximum compatibility, the argument name follows the same identifier rules as the kernel name.
Width (in bytes): Specifies the data width of the AXI data channels. Xilinx recommends matching to the native data width of the memory controller AXI4 slave interface. The memory controller slave interface is typically 64 bytes (512 bits) wide.
Number of arguments: Specifies the number of arguments to associate with this interface. Each argument represents a data pointer to global memory that the kernel can access.

Argument definition

Interface: Specifies the name of the AXI Interface. This value is copied from the interface name defined in the table, and cannot be modified here.
Argument name: Specifies the name of the pointer argument as it appears on the function prototype signature. Each argument is assigned an ID value. This ID value is used to access the argument from the host software as described in Host Application. The ID value assignments can be found on the summary page of this wizard. To ensure maximum compatibility, the argument name follows the same identifier rules as the kernel name. The argument name is used in the generated RTL kernel control register module as an output signal.

Streaming Interfaces

The streaming interfaces page allows configuration of AXI4-Stream interfaces on the kernel. Streaming interfaces are only available on select platforms and if the chosen platform does not support streaming, then the page does not appear. Streaming interfaces are used for direct host-to-kernel and kernel-to-host communication, as well as continuously operating kernels as described in Streaming Data Transfers.

Number of AXI4-Stream interfaces: Specifies the number of AXI4-Stream interfaces that exist on the kernel. A maximum of 32 interfaces can be enabled per kernel. Xilinx recommends keeping the number of interfaces as low as possible to reduce the amount of area consumed.

Name: Specifies the name of the interface. To ensure maximum compatibility, the argument name follows the same identifier rules as the kernel name.

Mode: Specifies whether the interface is a master or slave interface. An AXI4-Stream slave interface is a read-only interface, and the RTL kernel can be sent data with the clWriteStream API from the host program. An AXI4-Stream master interface is a write-only interface, and the host program can receive data through the interface with the clReadStream API.

Width (bytes): Specifies the TDATA width (in bytes) of the AXI4-Stream interface. This interface width is limited to 1 to 64 bytes in powers of 2.

The streaming interface uses the TDATA/TKEEP/TLAST signals of the AXI4-Stream protocol. Stream transactions consists of a series of transfers where the final transfer is terminated with the assertion of the TLAST signal. Stream transfers must adhere to the following:

AXI4-Stream transfer occurs when TVALID/TREADY are both asserted.
TDATA must be 8, 16, 32, 64, 128, 256, or 512 bits wide.
TKEEP (per byte) must be all 1s when TLAST is 0.
TKEEP can be used to signal a ragged tail when TLAST is 1. For example, on a 4-byte interface, TKEEP can only be 0b0001, 0b0011, 0b0111, or 0b1111 to specify the last transfer is 1-byte, 2 bytes, 3 bytes, or 4 bytes in size, respectively.
TKEEP cannot be all zeros (even if TLAST is 1).
TLAST must be asserted at the end of a packet.
TREADY input/TVALID output should be low if kernel is not started to avoid lost transfers.

Summary

This section summarizes the VLNV for the RTL kernel IP, the software function prototype, and hardware control registers created from options selected in the previous pages. The function prototype conveys what a kernel call would be like if it was a C function. See the host code generated example of how to set the kernel arguments for the kernel call. The register map shows the relationship between the host software ID, argument name, hardware register offset, type, and associated interface. Review this section for correctness before proceeding to generate the kernel.

Click OK to generate the top-level wrapper for the RTL kernel, the VADD temporary RTL kernel IP, the kernel.xml file, the simulation test bench, and the example host.cpp code. After these files are created, the RTL Kernel wizard opens a project in the Vivado Design Suite to let you complete kernel development.

Using the RTL Kernel Project in Vivado IDE

If you launched the RTL Kernel wizard from the Vitis IDE, after clicking OK on the Summary page, the Vivado Design Suite open with an example IP project to let you complete your RTL kernel code.

If you launched the RTL Kernel wizard from within the Vivado IP catalog, after clicking OK on the Summary page, an RTL Kernel Wizard IP is instantiated into your current project. From there you must take the following steps:

When the Generate Output Products dialog box appears, click Skip to close it.
Right-click the <kernel_name>.xci file that is added to the Sources view, and select Open IP Example Design.
In the Open Example Design dialog box, specify the Example project directory, or accept the default value, and click OK.
TIP: An example project is created for the RTL kernel IP. This example IP project is the same as the example project created if you launch the RTL Kernel wizard from the Vitis IDE, and is where you will complete the development work for your kernel.
You can now close the original Vivado project from which you launched the RTL Kernel wizard.

Depending on the Kernel Type you selected for the kernel options, the example IP project is populated with a top-level RTL kernel file that contains either a Verilog example and control registers as described in RTL Type Kernel Project, or an instantiated IP integrator block design as described in Block Design Type Kernel Project. The top-level Verilog file contains the expected input/output signals and parameters. These top-level ports are matched to the kernel specification file (kernel.xml) and can be combined with your RTL code, or /block design, to complete the RTL kernel.

The AXI4 interfaces defined in the top-level file contain a minimum subset of AXI4 signals required to generate an efficient, high throughput interface. Signals that are not present inherit optimized defaults when connected to the rest of the AXI system. These optimized defaults allow the system to omit AXI features that are not required, saving area and reducing complexity. If your RTL code or block design contains AXI signals that were omitted, you can add these signals to the ports in the top-level RTL kernel file, and the IP packager will adapt to them appropriately.

The next step in the process customizes the contents of the kernel and then packages those contents into a Xilinx Object (xo) file.

RTL Type Kernel Project

The RTL type kernel delivers a top-level Verilog design consisting of control register and the Vadd sub-modules example design. The following figure illustrates the top-level design configured with two AXI4-master interfaces. Care should be taken if the Control Register module is modified to ensure that it still aligns with the kernel.xml file located in the imports directory of the Vivado kernel project. The example block can be replaced with your custom logic or used as a starting point for your design.

The Vadd example block, shown in the following figure, consists of a simple adder function, an AXI4 read master, and an AXI4 write master. Each defined AXI4 interface has independent example adder code. The first associated argument of each interface is used as the data pointer for the example. Each example reads 16 KB of data, performs a 32-bit add one operation, and then writes out 16 KB of data back in place (the read and write address are the same).

The following table describes some important files in the example IP project, relative to the root of the Vivado project for the kernel, where <kernel_name> is the name of the kernel you specified in the RTL Kernel wizard.

Table 9. RTL Kernel Wizard Source and Test Bench File
Filename	Description	Delivered with Kernel Type
<kernel_name>_ex.xpr	Vivado project file	All
imports directory
<kernel_name>.v	Kernel top-level module	All
<kernel_name>_control_s_axi.v	RTL control register module	RTL
<kernel_name>_example.sv	RTL example block	RTL
<kernel_name>_example_vadd.sv	RTL example AXI4 vector add block	RTL
<kernel_name>_example_axi_read_master.sv	RTL example AXI4 read master	RTL
<kernel_name>_example_axi_write_master.sv	RTL example AXI4 write master	RTL
<kernel_name>_example_adder.sv	RTL example AXI4-Stream adder block	RTL
<kernel_name>_example_counter.sv	RTL example counter	RTL
<kernel_name>_exdes_tb_basic.sv	Simulation test bench	All
<kernel_name>_cmodel.cpp	Software C-Model example for software emulation.	All
<kernel_name>_ooc.xdc	Out-of-context Xilinx constraints file	All
<kernel_name>_user.xdc	Xilinx constraints file for kernel user constraints.	All
kernel.xml	Kernel description file	All
package_kernel.tcl	Kernel packaging script proc definitions	All
post_synth_impl.tcl	Tcl post-implementation file	All
exports directory
src/host_example.cpp	Host code example	All
makefile	Makefile example	All

Block Design Type Kernel Project

The block design type kernel delivers an IP integrator block design (.bd) at the top-level of the example project. A MicroBlaze processor subsystem is used to sample the control registers and to control the flow of the kernel. The MicroBlaze processor system uses a block RAM as an exchange memory between the host and the kernel instead of a register file.

For each AXI interface, a DMA and math operation sub-blocks are created to provide an example of how to control the kernel execution. The example uses the MicroBlaze AXI4-Stream interfaces to control the AXI Data Mover IP to create an example identical to the one in the RTL kernel type. Also, included is a Vitis IDE project to compile and link an ELF file for the MicroBlaze core. This ELF file is loaded into the Vivado kernel project and initialized directly into the MicroBlaze instruction memory.

The following steps can be used to modify the MicroBlaze processor program:

If the design has been updated, you might need to run the Export Hardware option. The option can be found in the File > Export > Export Hardware menu location. When the Export Hardware dialog opens, click OK.
The core development kit application can now be invoked. Select Tools > Launch Vitis from the main menu.
When the Vitis IDE opens, click X just to the right of the text on the Welcome tab to close the welcome dialog box. This shows an already loaded Vitis IDE project underneath.
From the Project Explorer, the source files are under the <Kernel Name>_control/src section. Modify these as appropriate.
When updates are complete, compile the source by selecting the menu option Project > Build All > Check for errors/warnings and resolve if necessary. The ELF file is automatically updated in the IDE.
Run simulation to test the updated program and debug if necessary.

Simulation Test Bench

A SystemVerilog test bench is generated for simulating the example IP project. This test bench exercises the RTL kernel to ensure its operation is correct. It is populated with the checker function to verify the add one operation.

This generated test bench can be used as a starting point in verifying the kernel functionality. It writes/reads from the control registers and executes the kernel multiple times while also including a simple reset test. It is also useful for debugging AXI issues, reset issues, bugs during multiple iterations, and kernel functionality. Compared to hardware emulation, it executes a more rigorous test of the hardware corner cases, but does not test the interaction between host code and kernel.

To run a simulation, click Vivado Flow Navigator > Run Simulation located on the left hand side of the GUI and select Run Behavioral Simulation. If behavioral simulation is working as expected, a post-synthesis functional simulation can be run to ensure that synthesis results are matched with the behavioral model.

Out-of-Context Synthesis

The Vivado kernel project is configured to run synthesis and implementation in out-of-context (OOC) mode. A Xilinx Design Constraints (XDC) file is populated in the design to provide default clock frequencies for this purpose.

You should always synthesize the RTL kernel before packaging it with the package_xo command. Running synthesis is useful to determine whether the kernel synthesizes without errors. It also provides estimates of resource utilization and operating frequency. Without pre-synthesizing the RTL kernel you could encounter errors during the v++ linking process, and it could be much harder to debug the cause.

To run OOC synthesis, click Run Synthesis from the Vivado Flow Navigator > Synthesis menu.

The synthesized outputs can also be used to package the RTL kernel with a netlist source, instead of RTL source.

IMPORTANT: A block design type kernel must be packaged as a netlist using the package_xo command.

Software Model and Host Code Example

A C++ software model of the add one example operation,<kernel_name>_cmodel.cpp, is provided in the ./imports directory. This software model can also be modified to model the function of your kernel. When running package_xo, this model can be included with the kernel source files to enable software emulation for the kernel. The hardware emulation and system builds always use the RTL description of the kernel.

In the ./exports/src directory, an example host program is provided and is called host_example.cpp. The host program takes the binary container as an argument to the program. The host code loads the binary as part of the init function. The host code instantiates the kernel, allocates the buffers, sets the kernel arguments, executes the kernel, and then collects and checks the results for the example add one function.

For information on using the host program and kernel code in an application, refer to Creating a Vitis IDE Project.

Generate RTL Kernel

After the kernel is designed and tested in the example IP project in the Vivado IDE, the final step is to generate the RTL kernel object file (.xo) for use by the Vitis compiler.

Click the Generate RTL Kernel command from the Vivado Flow Navigator > Project Manager menu. The Generate RTL Kernel dialog box opens with three main packaging options:

A source-only kernel packages the kernel using the RTL design sources directly.
The pre-synthesized kernel packages the kernel with the RTL design sources with a synthesized cached output that can be used later on in the flow to avoid re-synthesizing. If the target platform changes, the packaged kernel might fall back to the RTL design sources instead of using the cached output.
The netlist, design checkpoint (DCP), based kernel packages the kernel as a block box, using the netlist generated by the synthesized output of the kernel. This output can be optionally encrypted if necessary. If the target platform changes, the kernel might not be able to re-target the new device and it must be regenerated from the source. If the design contains a block design, the netlist (DCP) based kernel is the only packaging option available.

Optionally, all kernel packaging types can be packaged with the software model that can be used in software emulation. If the software model contains multiple files, provide a space in between each file in the Source files list, or use the GUI to select multiple files using the CTRL key when selecting the file.

After you click OK, the kernel output products are generated. If the pre-synthesized kernel or netlist kernel option is chosen, then synthesis can run. If synthesis has previously run, it uses those outputs, regardless if they are stale. The kernel Xilinx Object .xo file is generated in the exports directory of the Vivado kernel project.

At this point, you can close the Vivado kernel project. If the Vivado kernel project was invoked from the Vitis IDE, the example host code called host_example.cpp and kernel Xilinx Object (.xo) files are automatically imported into the ./src folder of the application project in the Vitis IDE.

Modifying an Existing RTL Kernel Generated from the Wizard

From the Vitis IDE, you can modify an existing RTL kernel by selecting it from the ./src folder of an application project where it is in use. Right-click the .xo file in the Project Explorer view, and select RTL Kernel Wizard. The Vitis IDE attempts to open the Vivado project for the selected RTL kernel.

TIP: If the Vitis IDE is unable to find the Vivado project, it returns an error and does not let you edit the RTL kernel.

A dialog box opens displaying two options to edit an existing RTL kernel. Selecting Edit Existing Kernel Contents re-opens the Vivado Project, letting you modify and regenerate the kernel contents. Selecting Re-customize Existing Kernel Interfaces opens the RTL Kernel wizard. Options other than the Kernel Name can be modified, and the previous Vivado project is replaced.

IMPORTANT: All files and changes in the previous Vivado project are lost when the updated RTL kernel project is created.

Design Recommendations for RTL Kernels

While the RTL Kernel Wizard assists in packaging RTL designs for use within the Vitis core development kit, the underlying RTL kernels should be designed with recommendations from the UltraFast Design Methodology Guide for the Vivado Design Suite (UG949).

In addition to adhering to the interface and packaging requirements, the kernels should be designed with the following performance goals in mind:

Memory Performance Optimizations for AXI4 Interface
Managing Clocks in an RTL Kernel
Quality of Results Considerations
Debug and Verification Considerations

Memory Performance Optimizations for AXI4 Interface

The AXI4 interfaces typically connects to DDR memory controllers in the platform.

Note: For optimal frequency and resource usage, it is recommended that one interface is used per memory controller.

For best performance from the memory controller, the following is the recommended AXI interface behavior:

Use an AXI data width that matches the native memory controller AXI data width, typically 512-bits.
Do not use WRAP, FIXED, or sub-sized bursts.
Use burst transfer as large as possible (up to 4k byte AXI4 protocol limit).
Avoid use of deasserted write strobes. Deasserted write strobes can cause error-correction code (ECC) logic in the DDR memory controller to perform read-modify-write operations.
Use pipelined AXI transactions.
Avoid using threads if an AXI interface is only connected to one DDR controller.
Avoid generating write address commands if the kernel does not have the ability to deliver the full write transaction (non-blocking write requests).
Avoid generating read address commands if the kernel does not have the capacity to accept all the read data without back pressure (non-blocking read requests).
If a read-only or write-only interfaces are desired, the ports of the unused channels can be commented out in the top level RTL file before the project is packaged into a kernel.
Using multiple threads can cause larger resource requirements in the infrastructure IP between the kernel and the memory controllers.

Managing Clocks in an RTL Kernel

An RTL kernel can have up to two external clock interfaces; a primary clock, ap_clk, and an optional secondary clock, ap_clk_2. Both clocks can be used for clocking internal logic. However, all external RTL kernel interfaces must be clocked on the primary clock. Both primary and secondary clocks support independent automatic frequency scaling.

If you require additional clocks within the RTL kernel, a frequency synthesizer such as the Clocking Wizard IP or MMCM/PLL primitive can be instantiated within the RTL kernel. Therefore, your RTL kernel can use just the primary clock, both primary and secondary clock, or primary and secondary clock along with an internal frequency synthesizer. The following shows the advantages and disadvantages of using these three RTL kernel clocking methods:

Single input clock: ap_clk
- External interfaces and internal kernel logic run at the same frequency.
- No clock-domain-crossing (CDC) issues.
- Frequency of ap_clk can automatically be scaled to allow kernel to meet timing.
Two input clocks: ap_clk and ap_clk_2
- Kernel logic can run at either clock frequency.
- Need proper CDC technique to move from one frequency to another.
- Both ap_clk and ap_clk_2 can automatically scale their frequencies independently to allow the kernel to meet timing.
Using a frequency synthesizer inside the kernel:
- Additional device resources required to generate clocks.
- Must have ap_clk and optionally ap_clk_2 interfaces.
- Generated clocks can have different frequencies for different CUs.
- Kernel logic can run at any available clock frequency.
- Need proper CDC technique to move from one frequency to another.

When using a frequency synthesizer in the RTL kernel there are some constraints you should be aware of:

RTL external interfaces are clocked at ap_clk.
The frequency synthesizer can have multiple output clocks that are used as internal clocks to the RTL kernel.
You must provide a Tcl script to downgrade DRCs related to clock resource placement in Vivado placement to prevent a DRC error from occurring. Refer to CLOCK_DEDICATED_ROUTE in the Vivado Design Suite Properties Reference Guide (UG912) for more information. The following is an example of the needed Tcl command that you will add to your Tcl script:
```
set_property CLOCK_DEDICATED_ROUTE ANY_CMT_COLUMN 
[get_nets pfm_top_i/static_region/base_clocking/clkwiz_kernel/inst/CLK_CORE_DRP_I/clk_inst/clk_out1
```
Note: This constraint should be edited to reflect the clock structure of your target platform.
Specify the Tcl script from step 3 for use by Vivado implementation, after optimization, by using the v++ --vivado.prop option as described in --vivado Options. The following option specifies a Tcl script for use by Vivado implementation, after completing the optimization step:
```
--vivado.prop:run.impl_1.STEPS.OPT_DESIGN.TCL.POST={<PATH>/<Script_Name>.tcl}
```
Specify the two global clock input frequencies which can be used by the kernels (RTL or HLS-based). Use the v++ --kernel_frequency option to ensure the kernel input clock frequency is as expected. For example to specify one clock use:
```
v++ --kernel_frequency 250
```
For two clocks, you can specify multiple frequencies based on the clock ID. The primary clock has clock ID 0 and the secondary has clock ID 1.
```
v++ --kernel_frequency 0:250|1:500
```
TIP: Ensure that the PLL or MMCM output clock is locked before RTL kernel operations. Use the locked signal in the RTL kernel to ensure the clock is operating correctly.

After adding the frequency synthesizer to an RTL kernel, the generated clocks are not automatically scalable. Ensure the RTL kernel passes timing requirements, or v++ will return an error like the following:

ERROR: [VPL-1] design did not meet timing - Design did not meet timing. One 
or more unscalable system clocks did not meet their required target 
frequency. Please try specifying a clock frequency lower than 300 MHz using 
the '--kernel_frequency' switch for the next compilation. For all system 
clocks, this design is using 0 nanoseconds as the threshold worst negative 
slack (WNS) value. List of system clocks with timing failure.

In this case you will need to change the internal clock frequency, or optimize the kernel logic to meet timing.

Quality of Results Considerations

The following recommendations help improve results for timing and area:

Pipeline all reset inputs and internally distribute resets avoiding high fanout nets.
Reset only essential control logic flip-flops.
Consider registering input and output signals to the extent possible.
Understand the size of the kernel relative to the capacity of the target platforms to ensure fit, especially if multiple kernels will be instantiated.
Recognize platforms that use stacked silicon interconnect (SSI) technology. These devices have multiple die and any logic that must cross between them should be flip-flop to flip-flop timing paths.

Debug and Verification Considerations

RTL kernels should be verified in their own test bench using advanced verification techniques including verification components, randomization, and protocol checkers. The AXI Verification IP (VIP) is available in the Vivado IP catalog and can help with the verification of AXI interfaces. The RTL kernel example designs contain an AXI VIP-based test bench with sample stimulus files.
The hardware emulation flow should not be used for functional verification because it does not accurately represent the range of possible protocol signaling conditions that real AXI traffic in hardware can incur. Hardware emulation should be used to test the host code software integration or to view the interaction between multiple kernels.