Integrating the Application Using the Vitis Tools Flow

While developing an AI Engine design graph, many design iterations are typically performed using the AI Engine compiler or AI Engine simulator tools. This method provides quick design iterations when focused on developing the AI Engine application. When ready, the AI Engine design can be integrated into a larger system design using the flow described in this chapter.

The Vitis™ tools flow simplifies hardware design and integration with a software-like compilation and linking flow, integrating the three domains of the Versal™ device: the AI Engine array, the programmable logic (PL) region, and the processing system (PS). The Vitis compiler flow lets you integrate your compiled AI Engine design graph (libadf.a) with additional kernels implemented in the PL region of the device, including HLS and RTL kernels, and link them for use on a target platform. You can call these compiled hardware functions from a host program running in the Arm® processor in the Versal device.

The following figure shows the high-level steps required to use the Vitis tools flow to integrate your application. The command-line process to run this flow is described here.
Note: You can also use this flow from within the Vitis IDE as explained in Using the Vitis IDE.
Figure 1: Vitis Tools Flow


IMPORTANT: Using Vitis tools and AI Engine tools require the setup described in Setting Up the Vitis Tool Environment.

The following steps can be adapted to any AI Engine design in a Versal device.

  1. As described in Compiling an AI Engine Graph Application, the first step is to create and compile the AI Engine graph into a libadf.a file using the AI Engine compiler. You can iterate between the AI Engine compiler, and the AI Engine simulator to develop the graph, until you are ready to proceed.
  2. Compiling PL Kernels: PL kernels are compiled for implementation in the PL region of the target platform using the v++ --compile command. These kernels can be C/C++ or OpenCL kernels, or RTL kernels, in compiled Xilinx object (xo) form.
  3. Linking the System: Link the compiled AI Engine graph with C/C++, OpenCL kernels, and RTL kernels onto a target platform. The process creates an XCLBIN file to load and run an AI Engine graph and PL kernels code in the target platform.
  4. Compile the Embedded Application for the Cortex-A72 Processor: Optionally compile a host application to run on the Cortex®-A72 core processor using the GNU Arm cross-compiler to create an ELF file. The host program interacts with the AI Engine kernels and kernels in the PL region. This compilation step is optional because there are several ways to deploy and interact with the AI Engine kernels, and the host program running in the PS is one way.
  5. Packaging the system: Use the v++ --package process to gather the required files to configure and boot the system, to load and run the application, including the AI Engine graph and PL kernels. This builds the necessary package to run emulation and debug, or run your application on hardware.

Platforms

A platform is a fully contained image that defines both the hardware (XSA) as well as the software (bare metal, Linux, or both). The XSA contains the hardware description of the platform, which is defined in the Vivado Design Suite, and the software is defined with the use of a bare-metal setup, or a Linux image defined through PetaLinux.

Types of Platforms

There are two types of platforms: base platform, or custom platform. A base platform is one that is provided by Xilinx (for example, the xilinx_vck190_base_202020_1), and a custom platform is one that you create. Determining which platform is best to use is based upon the design, so it is always best to start with a base platform and develop the custom platform in parallel.

Custom Platforms

When the base platform does not contain the features needed for a design, you can create a platform. Creating a platform allows you to provide your own IP or subsystems to meet your needs. The process to create a platform can be found in Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393).

Platform Clocking

Platforms have a variety of clocking: processor, programmable logic (PL), and AI Engine clocking. The following table explains the clocking for each.

Table 1. Platform Clocks
Clock Description
AI Engine Can be configured in the platform in the AI Engine IP.
Processor Can be configured in the platform in the CIPS IP.
Programmable Logic Can have multiple clocks and can be configured in the platform.
NoC Device dependent and can be configured in the platform in the CIPS and NoC IP.
  1. These clocks are derived from the platform and are affected by the device, speed grade and operating voltage.

For more information related to platform clocking, see Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393). For information on Versal device clocks, see Versal AI Core Series Data Sheet: DC and AC Switching Characteristics (DS957).

PL Kernels

Kernels implemented in the programmable logic (PL) region of the Versal device can be integrated into an AI Engine graph application, or can work alongside it. PL kernels can take the form of HLS kernels, written in C/C++ or OpenCL, or RTL kernels packaged in the Vivado Design Suite. These kernels must be separately compiled to produce the Xilinx object files (XO) used in integrating the system design on the target platform.

HLS kernels, written in C/C++ or OpenCL, can be written and compiled from within the Vitis HLS tool directly, or as part of the Vitis application acceleration development flow.

For information on creating and building RTL kernels, see RTL Kernels in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416).

Compiling PL Kernels

To compile kernels using the Vitis compiler command as described in the Compiling Kernels with Vitis Compiler in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416), use the following command syntax:

v++ --compile -t hw_emu --platform xilinx_vck190_base_202020_1 -g \
-k <kernel_name> <kernel>.cpp -o <kernel_name>.xo --save-temps

The v++ command uses the options described in the following table.

Table 2. Vitis Compiler Options
Option Description
--compile Specifies compilation mode.
-t hw_emu Specifies the build target for the compilation process. For more information, see the Build Targets section in Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393).
--platform Specifies the path and name of the target platform. For this example command-line option, it is assumed to have PLATFORM_REPO_PATHS set to the right platform path.
-g Enables the debug features. This is required for emulation modes.
-k Specifies the kernel name. This must match the function name in the specified kernel source file.
-o Specifies the output file name of the compiled Xilinx object file (.xo).
--save-temps Saves the temporary files generated during the compilation process. This is optional.

Clocking the PL Kernels

For every design, there are a variety of combinations of PL kernels, inside or outside the AI Engine graph. Depending on how the graph is designed and what kernel type is being used, specific clocking is required during the build process. PL kernel clocking is versatile and must be set up at certain points in the flow; this allows you to set the exact frequency to run the kernel at. To set the exact frequency of a PL kernel in the graph you must specify the clocking in three locations:

  • ADF graph
  • Vitis compilation (v++ -c)
  • Vitis linking (v++ -l)

For PL kernels that are outside the graph, specify the clocking in two locations:

  • Vitis compilation (v++ -c)
  • Vitis linking (v++ -l)

You must specify the clocking depending on where the kernels are located. The following table describes the default clocks based on the kernel location.

Table 3. Default Kernel Clocks
Kernel Location Description
AI Engine Kernels Clocked per the AI Engine clock frequency. All cores run with the same clock frequency.
PL Kernels in AI Engine Graph Default frequency for all PL kernels – a quarter of the AI Engine clock frequency derived from the platform.

Clock frequency for all PL kernels can be specified using AI Engine compiler option aiecompiler --pl-freq=100.

PL Kernels Outside AI Engine Graph HLS: Default frequency for all HLS kernels - 150 MHz

RTL: Frequency is set to the frequency that the XO file was compiled with.

PL Kernels Added to Platform Using the Vitis Linker Platforms have a default clock. If no clocking option is set at the command line or configuration file the default clock is used. This default can be overridden depending on the design and required clock value, as shown in the following table.

Setting the clocks at the v++ link step allows you to choose a frequency based on the platform. The following table describes the Vitis compiler clocking options during the link step v++ --link.

Table 4. Vitis Linking Clock Options
[clock] Options Description
--clock.defaultFreqHz arg Specify a default clock frequency to use in Hz.
--clock.defaultId arg Specify a default clock reference ID to use.
--clock.defaultTolerance arg Specify a default clock tolerance to use.
--clock.freqHz arg <frequency_in_Hz>:<cu_0>[.<clk_pin_0>][,<cu_n>[.<clk_pin_n>]]

Specify a clock frequency in Hz and a list of associated compute unit names and optionally their clock pins.

--clock.id arg <reference_ID>:<cu_0>[.<clk_pin_0>][,<cu_n>[.<clk_pin_n>]]

Specify a clock reference ID and a list of associated compute unit names and optionally their clock pins.

--clock.tolerance arg <tolerance>:<cu_0>[.<clk_pin_0>][,<cu_n>[.<clk_pin_n>]]

Specify a clock tolerance and a list of associated compute unit names and optionally their clock pins.

The following table describes the steps to set clock frequencies for kernels, inside and outside the ADF graph, for the relevant PL kernel type.

Table 5. Compiling PL Kernels with Non-default Clocking
PL Kernel Location Clock Specification
Inside ADF Graph
  1. Specify the clock frequency per PL kernel in the graph
    • For PL kernel:
      adf::pl_frequency(<pl_kernel> = <FreqMHz>;
    • For PLIO kernel:
      adf::PLIO *<input> = new adf::PLIO(<logical_name>, <plio_width>, <file>, <FreqMHz);
  2. Compile the HLS code using the Vitis compiler. For RTL kernels, go to step 3.
    v++ -c -k kernelName kernel.cpp --hls.clock freqHz:kernelName
    Note: To change the frequency at which HLS kernels are compiled use: --hls.clock arg:kernelName. arg must be in Hz. (for example, 250000000Hz is 250 MHz).
  3. Per kernel, specify the clock it uses in the Vitis linker.
    v++ -l ... --clock.freqHz <freqHz>:kernelName.ap_clk
Outside ADF Graph
  1. Compile the HLS code using the Vitis compiler. For RTL kernels, go to step 2.
    v++ -c -k kernelName kernel.cpp --hls.clock freqHz:kernelName
    Note: To change the frequency at which HLS kernels are compiled use: --hls.clock arg:kernelName. arg must be in Hz (for example, 250000000Hz is 250 MHz).
  2. Per kernel, specify the clock it uses in the Vitis linker.
    v++ -l ... --clock.freqHz <freqHz>:kernelName.ap_clk
Note: Clocking changes at the linker stage take precedence.

See Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393) for more detailed information on how to compile kernels for specific platform clocks and clocking information.

Linking the System

After the AI Engine graph and the C/C++ or OpenCL kernels are compiled, and any RTL kernels are packaged, the Vitis v++ --link command links them with the target platform to build the device binary (XCLBIN), used to program the hardware. For more information, see Linking the Kernels in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416).

The following is an example of the linking command for the Vitis compiler in the AI Engine design flow.

v++ --link -t hw_emu --platform xilinx_vck190_base_202020_1 -g \
<pl_kernel1>.xo <pl_kernel2>.xo ../libadf.a -o vck190_aie_graph.xclbin \
--config ../system.cfg --save-temps

The v++ command uses the options in the following table.

Table 6. Vitis Compiler Link Options
Option Description
--link Specifies the linking process.
-t hw_emu Specifies the build target of the link process. For the AI Engine kernel flow, the target can be either hw_emu for emulation and test, or hw to build the system hardware.
IMPORTANT: The v++ compilation and linking commands must use both the same build target (-t) and the same target platform (--platform).
--platform Specifies the path to the target platform.
-g Specifies the addition of debugging logic required to enable debug (for hardware emulation) and to capture waveform data.
<pl_kernel1>.xo <pl_kernel2>.xo Specifies the input compiled PL kernel object files (.xo) to link with the AI Engine graph and the target platform.
../libadf.a Specifies the input compiled AI Engine graph application to link with the PL kernels and the target platform.
-o Specifies the device binary (XCLBIN) file that is the output of the linking process.
--config Specifies a configuration file to define some of the compilation or linking options.1
--save-temps Indicates that the temporary files created during the build process should be preserved for later examination or use. This includes output files created by Vitis HLS and the Vivado Design Suite.
  1. The --config option is used to simplify the v++ command line by moving many commands with extended syntax into a file that can be specified from the command line. For more information, see the Vitis Compiler Configuration file in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416).
TIP: The config file requirements for the command line are different from the requirements of the Vitis IDE, as discussed in Configuring the HW-Link Project.

For the AI Engine kernel flow, the Vitis compiler requires two specific sections in the configuration file: [connectivity] and [advanced]. The following is an example configuration file.

[connectivity]
nk=mm2s:1:mm2s
nk=s2mm:1:s2mm
stream_connect=mm2s.s:ai_engine_0.DataIn1
stream_connect=ai_engine_0.DataOut1:s2mm.s
[advanced]
param=compiler.addOutputTypes=hw_export

The [connectivity] section of the configuration file has options described in the following table.

Table 7. Connectivity Section Options
Option Description
nk Specifies the number of kernels instances or CUs the v++ command adds to the device binary (XCLBIN).
IMPORTANT: This applies to PL kernels that are not included in the AI Engine graph, because those kernels are specified in the graph code.

The nk option specifies the kernel name, the number of instances, or CUs of that kernel, and the CU name for each instance. In the example, nk=mm2s:1:mm2s specifies the kernel mm2s should only have one instance and that instance should be called mm2s.

Multiple instances of the kernels are specified as nk=mm2s:2:mm2s_1.mm2s_2. This indicates that mm2s should have two CUs called mm2s_1 and mm2s_2. For more information, see Creating Multiple Instances of a Kernel in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416).

sc Defines connections between the ports of the AI Engine graph and streaming ports of PL kernels that are not included in the graph. Connections can be defined as the streaming output of one kernel connecting to the streaming input of a second kernel, or to a streaming input port on an IP implemented in the target platform. For more information, see --connectivity Options in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416).

The example stream_connect=mm2s.s:ai_engine_0.DataIn1 from the config file, defines a connection between the streaming output of the mm2s PL kernel, and the DataIn1 input port of the AI Engine graph.

The example stream_connect=mm2s.s:ai_engine_0.DataOut1 defines a connection between the DataOut1 output port of the AI Engine graph, to the input port of the PL kernel s2mm. For more information, see Specify Streaming Connections Between Compute Units in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416).

[advanced]param=compiler.addOutputTypes=hw_export specifies the creation of a new XSA for the target platform. The exported XSA has the name of the output file specified by the -o option, with the file extension of .xsa.

TIP: The exported XSA is required for building the fixed platform in the bare-metal flow as described in Building a Bare-metal System.

During the linking process, the Vitis compiler invokes the Vivado Design Suite to generate the device binary (XCLBIN) for the target platform. The XCLBIN file is used to program the device and includes the following information.

PDI
Programming information for the AI Engine array
Debug data
Debug information when included in the build
Memory topology
Defines the memory resources and structure for the target platform
IP Layout
Defines layout information for the implemented hardware design
Metadata
Various elements of platform meta data to let the tool load and run the XCLBIN file on the target platform

For more information on the XRT use of the XCLBIN file, see XRT.

Compile the Embedded Application for the Cortex-A72 Processor

After linking the AI Engine graph and PL kernels, the focus moves to the embedded application running in the PS that interacts with the AI Engine graph and kernels. The PS application is written in C/C++, using API calls to control the initialization, running, and closing of the AI Engine graph as described in Run-Time Graph Control API.

You compile the embedded application by following the typical cross-compilation flow for the Arm Cortex-A72 processor. The following are example commands for compiling and linking the PS application:

aarch64-linux-gnu-g++ -std=c++14 -O0 -g -Wall -c \
-I<platform_path>/sysroots/aarch64-xilinx-linux/usr/include/xrt \
--sysroot=<platform_path>/sysroots/aarch64-xilinx-linux/ \
-I./ -I./src -I${XILINX_HLS}/include/ -I${XILINX_VITIS}/aietools/include -o sw/host.o sw/host.cpp

aarch64-linux-gnu-g++ -std=c++14 -O0 -g -Wall -c \
-I<platform_path>/sysroots/aarch64-xilinx-linux/usr/include/xrt \
--sysroot=<platform_path>/sysroots/aarch64-xilinx-linux/ \
-I./ -I./src -I${XILINX_HLS}/include/ -I${XILINX_VITIS}/aietools/include -o sw/aie_control_xrt.o Work/ps/c_rts/aie_control_xrt.cpp

Many of the options in the preceding command are standard and can be found in a description of the g++ command. The more important options are listed in the following table:

Table 8. Command Options
Command Options
-std=c++14
-I<platform_path>/sysroots/aarch64-xilinx-linux/usr/include/xrt
--sysroot=<platform_path>/sysroots/aarch64-xilinx-linux/
-I./ -I./src
-I${XILINX_HLS}/include/
-I${XILINX_VITIS}/aietools/include
-o sw/host.o sw/host.cpp

The cross compiler aarch64-linux-gnu-g++ is used to compile the Linux host code. aie_control_xrt.cpp is copied from the directory Work/ps/c_rts.

aarch64-linux-gnu-g++ -ladf_api_xrt -lgcc -lc -lxilinxopencl -lpthread -lrt -ldl \
-lcrypt -lstdc++ -lxrt_coreutil \
-L<platform_path>/sysroots/aarch64-xilinx-linux/usr/lib \
--sysroot=<platform_path>/sysroots/aarch64-xilinx-linux \
-L${XILINX_VITIS}/aietools/lib/aarch64.o -o sw/host.exe sw/host.o sw/aie_control_xrt.o

Note in the preceding linker script that it links the adf_api_xrt libraries, which is necessary for the ADF API to work with the XRT API.

xrt_coreutil are required libraries for XRT, and for both the OpenCL API and XRT API.

While many of the options can be found in a description of the g++ command, some of the more important options are listed in the following table.

Table 9. Command Options
Option Description
-ladf_api_xrt Required for the ADF API. For more information, see Host Programming on Linux.

This is used to control the AI Engine through XRT. If not controlling with XRT, use -ladf_api with the path -L${XILINX_VITIS}/aietools/lib/aarch64none.so. For more information see Host Programming for Bare-metal Systems.

-lxilinxopencl Required for the OpenCL API. For more information see Controlling PL Kernels with the OpenCL API.
-lxrt_coreutil Required for the XRT API.
-L<platform_path>/sysroots/aarch64-xilinx-linux/usr/lib
--sysroot=<platform_path>/aarch64-xilinx-linux
-L${XILINX_VITIS}/aietools/lib/aarch64.o
-o sw/host.exe

Packaging

After the AI Engine graph is compiled and linked with the PL kernels, the PS application is compiled, and all the required outputs are generated, the next step in the build process is to package the required files to configure and boot the Versal device. This requires the use of the v++ --package command as described in Vitis Compiler Command in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416).

For Versal ACAPs, the programmable device image (PDI) file is used to boot and program the hardware device. For hardware emulation the --package command adds the PDI and EMULATION_DATA sections to the XCLBIN file, and outputs a new XCLBIN file. For hardware builds, the package process creates an XCLBIN file containing ELF files and graph configuration data objects (CDOs) for the AI Engine application.

In the Vitis IDE, the package process is automated and the tool creates the required files based on the build target, platform, and OS. However, in the command line flow, you must specify the v++ --package command with the correct options for the job.

Packaging the System

For both hardware and hardware emulation, the v++ --package command takes the XCLBIN file and libadf.a as input, produces a script to launch hardware emulation (launch_hw_emu.sh), and writes the required support files. An example command line follows:

v++ --package --config package.cfg ./aie_graph/libadf.a \
./project.xclbin -o aie_graph.xclbin

where, the --config package.cfg option specifies a configuration file with the following options:

platform=xilinx_vck190_base_202020_1
target=hw_emu
save-temps=1

[package]
boot_mode=sd
out_dir=./emulation
enable_aie_debug=1
rootfs=<path_to_platform>/sw/versal/xilinx-versal-common-v2020.2/rootfs.ext4
image_format=ext4
kernel_image=<path_to_platform>/sw/versal/xilinx-versal-common-v2020.2/Image
sd_file=host.exe

The following table explains the options for both hardware and hardware emulation.

Table 10. Hardware and Hardware Emulation Options
Command-line Flag Hardware Hardware Emulation Details
platform Target platform Target platform Either a base platform, or a custom platform that meets AI Engine flow requirements.
target hw hw_emu Specifies the hardware emulation build target. Specifying hw_emu as the target causes a number of files to be generated, including the PDI to boot the device, as well as files required for emulation. Specifying hw only generates the PDI file required to configure and boot the hardware.
save-temps Causes the Vitis compiler to save intermediate files created during the build and package process.
Package Options
boot_mode1 sd sd Indicates the device boots from an SD card or from a QSPI image in flash memory. Values can be: sd or qspi.
out-dir <path> <path> Specifies a directory where output files should be created. If out-dir is not specified, the files are written to the current working directory.
kernel_image <path>/Image <path>/Image Specifies the image file that is specified as part of the linking command. The file here should be the same for both targets.
rootfs <path>/rootfs.cpio <path>/rootfs.cpio Specifies the path to the Root FS file that is requires as part of the linking command. The file should be the same for both targets.
enable-aie-debug Generate debug features for the AI Engine kernels. This can be used in both hardware and emulation builds.
defer_aie_run The AI Engines will be enabled by the PS application. When unset, generate the CDO commands to enable AI Engines during PDI load instead. Only valid if libadf.a is an input file and the platform is of a Versal platform.
ps_elf <file>,core <file>,core Used only for bare-metal designs. Automatically programs the PS core to run. Example: host.elf, a72-0
domain aiengine aiengine Specifies the domain to be run. For AI Engine designs, this should always be aiengine.
sd_file <file> <file> Copies the ELF for the main application that will run on the Cortex-A72 processor for bare metal, and any files needed to run on Linux. The XCLBIN file is automatically copied to the out-dir or sd_card folder. To have more files copied to the sd_card folder, you must specify this option multiple times.
  1. The xilinx_vck190_v202020_1 platform does not support the qspi option. Custom platforms that are configured to support it will work.

The following table shows the output defined by -out-dir produced when building for both hardware and hardware emulation.

Table 11. Table of Outputs
Hardware
|-- BOOT.BIN
|-- boot_image.bif
|-- sd_card
|   |-- BOOT.BIN
|   |-- boot.scr
|   |-- aie_graph.xclbin
|   |-- host.exe
|   |-- Image
|   |-- init.sh
|   `-- platform_desc.txt
|-- sd_card.img
Hardware Emulation
|-- BOOT_bh.bin	//Boot header
|-- BOOT.BIN			 //Boot File
|-- boot_image.bif
|-- launch_hw_emu.sh	   //Hardware emulation launch script
|-- libadf                  //AIE emulation data folder
|   `-- cfg
|       |-- aie.control.config.json
|       |-- aie.partial.aiecompile_summary
|       |-- aie.shim.solution.aiesol
|       |-- aie.sim.config.txt
|       `-- aie.xpe
|-- plm.bin                 //PLM boot file
|-- pmc_args.txt            //PMC command argument specification file
|-- pmc_cdo.bin             //PMC boot file
|-- qemu_args.txt           //QEMU command argument specification file
|-- sd_card
|   |-- BOOT.BIN
|   |-- boot.scr
|   |-- aie_graph.xclbin
|   |-- host.exe
|   |-- Image
|   |-- init.sh
|   `-- platform_desc.txt
|-- sd_card.img
`-- sim                      //Vivado simulation folder 

For hardware emulation, the key output file is the launch_hw_emu.sh script used to launch emulation. The sd_card.img image includes the BOOT.BIN (U-Boot to boot Linux, PDI boot data, etc.), Image (kernel image), XCLBIN file, user application (host.exe), and other files. For example, all generated files are placed in a folder called emulation.

To use the sd_card.img file on a Linux host, use the dd command to write the image to the SD card. If you are targeting Linux but with package.image_format=fat32, copy the sd_card folder to an SD card formatted for FAT32. This is not needed for hardware emulation.

TIP: The PS host application is included in the sd_card output, however, it is not incorporated into the rootfs . If you want to include the executable images in rootfs , you must rebuild the rootfs before running the v++ --package command.

If the design needs to be programmed to a local flash memory, make sure --package.boot_mode qspi is used. This allows the use of the program_flash command or the use of the Vitis IDE to program the device or program the flash memory, described in Using the Vitis IDE.

Building a Bare-metal System

Building a bare-metal system requires a few additional steps from the standard application flow previously described. The specific steps required are described here.
  1. Build the bare-metal platform.

    Building bare-metal applications requires a bare-metal domain in the platform. The base platform xilinx_vck190_base_202020_1 does not have a bare-metal domain, which mean you must create a platform with one. Starting from the v++ linking process as described in Linking the System, you must create a custom platform because the PS application needs drivers for the PL kernels in the design.

    Use the XSA generated during the link process to create a new platform using the following command:

    generate-platform.sh -name vck190_baremetal -hw <filename>.xsa \
    							-domain psv_cortexa72_0:standalone

    where:

    • -name vck190_baremetal: Specifies a name for the platform that will be created. The platform will be created according to the specified name. In this example it will be written to: ./vck190_baremetal/export/vck190_baremetal
    • -hw <filename>.xsa: Specifies the name of the input XSA file generated during the v++ --link command. The <filename> will be the same as the file name specified for the .xclbin output.
    • -domain psv_cortexa72_0:standalone: Specifies the processor domain and operating system to apply to the new platform.

    You can add the new platform to your platform repository by adding the file location to your $PLATFORM_REPO_PATHS environment variable. This makes it accessible to the Vitis IDE for instance, or allows you to specify the platform in command-lines by simply referring to the name rather than the whole path.

    IMPORTANT: The generated platform will be used only for building the bare-metal PS application and is not used any other places across the flow.
  2. Compile and link the PS application.

    To build the PS application for the bare-metal flow, use the platform generated in the prior step. You need the PS application (main.cpp), and the bare-metal AI Engine control file (aie_control.cpp), which is created by the aiecompiler command and can be found in the ./Work/ps/c_rts folder.

    Compile the main.cpp file using the following command:

    aarch64-none-elf-gcc -I.. -I. -I../src \
    -I./vck190_baremetal/export/vck190_baremetal/sw/vck190_baremetal/standalone_domain/bspinclude/include \
    -g -c -std=c++11 -o main.o main.cpp
    Note: You must include the BSP include files for the generated platform, located at: ./vck190_baremetal/export/vck190_baremetal/sw/vck190_baremetal/standalone_domain/bspinclude/include

    Compile the aie_control.cpp file using the following command:

    aarch64-none-elf-gcc -I.. -I. -I../src \
    -I./vck190_baremetal/export/vck190_baremetal/sw/vck190_baremetal/standalone_domain/bspinclude/include \
    -g -c -std=c++11 -o aie_control.o ../Work/ps/c_rts/aie_control.cpp

    Link the PS application using the two compiled object files:

    aarch64-none-elf-gcc main.o aie_control.o -g -mcpu=cortex-a72 -Wl,-T -Wl,./lscript.ld \
    
    -L./vck190_baremetal/export/vck190_baremetal/sw/vck190_baremetal/standalone_domain/bsplib/lib  \
    -ladf_api -Wl,--start-group,-lxil,-lgcc,-lc,-lstdc++,--end-group -o main.elf
    Note: You also need the BSP libxil.a located at ./vck190_baremetal/export/vck190_baremetal/standalone_domain/bsplib/lib during linking. Here the assumption is the AI Engine are enabled during the PMC boot.
  3. Package the System

    Finally, you must run the package process to generate the final boot-able image (PDI) for running the design on the bare-metal platform. This command produces the SD card content for booting the device and running the application. Refer to Packaging for more information. This requires the use of the v++ --package command as shown below:

    v++ -p -t hw \
        -f xilinx_vck190_base_202020_1 \
        libadf.a project.xclbin \
        --package.out_dir ./sd_card \
        --package.domain aiengine \
        --package.defer_aie_run \
        --package.boot_mode sd \
        --package.ps_elf main.elf,a72-0 \
        -o aie_graph.xclbin
    TIP: For bare-metal ELF files running on PS cores, you should also add the package.ps_elf option to the --package command.

    The use of --package.defer_aie_run is related to the way the AI Engine graph is run. If the application is loaded and launched at boot time, these options are not required. If your host application launches and controls the graph, then you need to use these options when compiling and packaging your system as described in Deploying the System.

    The ./sd_card folder, specified by the --out_dir option, contains the following files produced for the hardware build:

    |-- BOOT.BIN	//BOOT.BIN file containing PDI and the application ELF
    |-- boot_image.bif	  //bootgen input file used to create BOOT.BIN
    `-- sd_card              //SD card folder
        |-- aie_graph.xclbin     //xclbin output file (not used)
        `-- BOOT.BIN         //BOOT.BIN file containing PDI and the application ELF

    Copy the contents of the sd_card folder to an SD card to create a boot device for your system.

Now you have built the bare-metal system, you can run it, or debug it as needed.

Running the System

Running the system depends on the build target. The process of running the hardware emulation build is different from running the hardware build.

For the hardware build, copy the contents of the sd_card folder produced by the package process to an actual SD card. That device becomes the boot device for your system. Boot your system and launch your application as designed. To capture event trace data when running the hardware, see Performance Analysis of AI Engine Graph Application. To debug the running hardware, see Debugging the AI Engine Application.

For hardware emulation the --package command generates the launch_hw_emu.sh script as part of the process of packaging the system. You can use this script to launch the emulation environment for the AI Engine application for test and debug purposes. Hardware emulation runs the AI Engine simulator for the graph application, runs the Vivado logic simulator for the PL kernels, and runs QEMU for the PS host application.

Use the following command to launch hardware emulation from the command line:

./launch_hw_emu.sh -graphic-xsim
Note: The -graphic-xsim launches the Vivado logic simulator window where you can specify what signals from the design you want to view. It does not include internal AI Engine signals. Here, you must click the Run All button in the window to continue execution.

The launch_hw_emu.sh script launches QEMU in system mode, and loads and runs the AI Engine application, running the PL kernels in the Vivado simulator. If the emulation flow completes successfully, at the end of the emulation you should see something like the following:

[LAUNCH_EMULATOR] INFO: 09:44:09 : PS-QEMU exited
[LAUNCH_EMULATOR] INFO: 09:44:09 : PMU/PMC-QEMU exited
[LAUNCH_EMULATOR] INFO: 09:44:09 : Simulation exited
pmu_path /scratch/aie_test1/hw_emu_pmu.log
pl-sim_dir /scratch/aie_test1/sim/behav_waveform/xsim
Please refer PS /simulate logs at /scratch/aie_test1 for more details.
DONE!
INFO: Emulation ran successfully

When launching hardware emulation, you can specify options for the AI Engine simulator that runs the graph application. The options can be specified from the launch_hw_emu.sh script using the -aie-sim-options as described in Simulator Options for Hardware Emulation.

When the emulation is fully booted and the Linux prompt is up, make sure to set the following environment variable:

export XILINX_XRT=/usr

This will ensure that the host application will work. Note that this also must be done when running on hardware.

Deploying the System

The Vitis design execution model has multiple considerations that impact how the AI Engine graph is loaded onto the board, run, reset, and reloaded. Depending on the needs of the application you have a choice of loading the AI Engine graph at board boot up time, or using the PS host application. In addition, you can also control running the graph as soon as the graph is loaded or defer it to a later time. You also have the option of running the graph infinitely or for a fixed number of iterations or cycles.

AI Engine Graph Load and Run

The AI Engine graph can be loaded and run immediately at boot, or it can be loaded by the host PS application. Additionally, you also have the option of deferring the running of the graph after the graph has been loaded – using the graph.run() host API XRT call. By default, Xilinx® host code will load and run the graph. However the v++ --package.defer_aie_run option will let you defer the graph run until after the graph has been loaded using the graph.run() API call. The following table lists the deployment options.

Table 12. Deploying the AI Engine Graph
Host Control Run Forever
Specify v++ --package.defer_aie_run to stop the AI Engine from starting at boot-up. Enable it in the PDI and let the graph run forever.
Enable the graph from the PS program using graph.run()

AI Engine Run Iterations

The AI Engine graph can run for a limited number of iterations or infinitely. By default, the graph runs infinitely. You can use the graph.run(run_iterations) or graph.end(cycles) to limit the number of graph runs to a specific number of iterations or for a specific number of cycles. See Run-Time Graph Control API.