Profiling the Model

Vitis AI Profiler

The Vitis™ AI profiler is a set of tools that helps profile and visualize AI applications based on VART:

Easy to use as it neither requires any change in the user code nor any re-compilation of the program.
Visualize system performance bottlenecks.
Illustrate the execution state of different compute units (CPU/DPU).

The Vitis AI Profiler is an all-in-one profiling solution for Vitis AI. It is an application level tool to profile and visualize AI applications based on VART. For an AI application, there are components that run on the hardware, for example, neural network computation usually runs on the DPU, and there are components that run on a CPU as a function that is implemented by C/C++ code-like image pre-processing. This tool helps you to put the running status of all these different components together.

Vitis AI Profiler Architecture

The Vitis AI Profiler architecture is shown in the following figure:

Vitis AI Profiler GUI Overview

DPU Summary: A table of the number of runs and minimum/average/maximum times (ms) for each kernel.
DPU Throughput and DDR Transfer Rates: Line graphs of achieved FPS and read/write transfer rates (in MB/s) as sampled during the application.
Timeline Trace: This includes timed events from VART, HAL APIs, and the DPUs.

Note: The Vitis Analyzer is the default GUI for vaitrace in Vitis AI 1.3 and later releases.

Getting Started with the Vitis AI Profiler

System Requirements

Hardware

Supports Zynq® UltraScale+™ MPSoC (DPUCZDX8G)
Supports Versal™ ACAP (DPUCVDX8G/ DPUCVDX8H)
Supports Alveo™ Data Center accelerator cards (DPUCAHX8H/ DPUCAHX8L)

Software

Supports VART v1.2+

Installing the Vitis AI Profiler

Prepare the debug environment for vaitrace in the Zynq UltraScale+ MPSoC PetaLinux platform.
1. Configure and build PetaLinux by running petalinux-config -c kernel.
2. Enable the following settings for the Linux kernel.
  - General architecture-dependent options ---> [*] Kprobes
  - Kernel hacking ---> [*] Tracers
  - Kernel hacking ---> [*] Tracers --->
    [*] Kernel Function Tracer
    [*] Enable kprobes-based dynamic events
    [*] Enable uprobes-based dynamic events
3. Run petalinux-config -c rootfs and enable the following setting for root-fs.
  user-packages ---> modules ---> [*] packagegroup-petalinux-self-hosted
4. Run petalinux-build.
Install vaitrace. vaitrace is integrated into the VART runtime. If VART runtime is installed, vaitrace will be installed into /usr/bin/vaitrace.

Starting a Simple Trace with vaitrace

The following example uses VART ResNet50 sample:

Download and set up Vitis AI.

Start testing and tracing.

For C++ programs, add vaitrace in front of the test command as follows:

# cd ~/Vitis_AI/examples/VART/samples/resnet50
# vaitrace ./resnet50 /usr/share/vitis_ai_library/models/resnet50/resnet50.xmodel

For Python programs, add -m vaitrace_py to the Python interpreter command as follows:

# cd ~/Vitis_AI/examples/VART/samples/resnet50_mt_py
# python3 -m vaitrace_py ./resnet50.py 2 /usr/share/vitis_ai_library/models/resnet50/resnet50.xmodel

vaitrace and XRT generates some files in the working directory.

Copy all .csv files and xclbin.ex.run_summary to your system. You can open the xclbin.ex.run_summary using vitis_analyzer 2020.2 and above:
- If using the command line, run # vitis_analyzer xclbin.ex.run_summary.
- If using the GUI, select File > Open Summary > xclbin.ex.run_summary.

To know more about the Vitis Analyzer, see Using the Vitis Analyzer.

VAI Trace Usage

Command Line Usage

# vaitrace --help
usage: Xilinx Vitis AI Trace [-h] [-c [CONFIG]] [-d] [-o [TRACESAVETO]] [-t [TIMEOUT]] [-v]

  cmd		      Command to be traced
  -b                 Bypass mode, just run command and by pass vaitrace, for debug use
  -c [CONFIG]        Specify the configuration file
  -o [TRACESAVETO]   Save trace file to
  -t [TIMEOUT]       Tracing time limitation, default value is 30 for vitis analyzer format, and 5 for .xat format
  --txt              Display txt summary
  --fine_grained     Fine-grained mode

Following are some important and frequently-used arguments:

cmd: cmd is your executable program of Vitis AI that wants to be traced.
-t: Controlling the tracing time (in seconds) starting from the [cmd] being launched, the default value is 30. In other words, if no -t is specified for vaitrace, the tracing will stop after [cmd] running for 30 seconds. The [cmd] will continue to run as normal, but it will stop collecting tracing data.
Note: Trace about 50~100 images at once because less than 50 may not be enough for some statistic information and more than 100 will slow down the system significantly.
-c: You can start a tracing with more custom options by writing these options on a JSON configuration file and specify the configuration by -c. Details of configuration file will be explained in the next section.
-o: Location of the report. This is only available for the text summary mode. By default, the test summary will output to STDOUT.
--txt: Output text summary. vaitrace does not generate a report for the Vitis Analyzer in this mode.
--fine_grained: Start trace in the fine grained mode. This mode generates a mass of trace data and the trace time is limited to 10 seconds.

Others arguments are used for debugging.

Configuration

It is recommended to use a configuration file to record trace options for vaitrace. You can start a trace with configuration by using vaitrace -c trace_cfg.json.

Configuration priority: Configuration File > Command Line > Default.

Here is an example of vaitrace configuration file.


  "trace": {
      "enable_trace_list": ["vitis-ai-library", "vart", "custom"],
      "trace_custom": []
  }
}

Table 1. Contents of the Configuration File
Key Name		Value Type	Description
options		object	Vaitrace options
trace		object
	enable_trace_list	list	Built-in trace function list to be enabled, available value "vitis-ai-library", "vart", “opencv”, "custom", custom for function in trace_custom list
trace_custom		list	The list of functions to be traced that are implemented by user. For the name of function, naming space are supported. You can see an example of using custom trace function later in this document

Text Summary

When the –txt option is used, vaitrace prints an ASCII table as shown in the following figure:

The fields are defined in the following list:

DPU: Name of the DPU.
Batch: Batch size of the DPU.
SubGraph: Name of subgraph in the xmodel.
Workload: Computation workload (MAC indicates two operations, only available for conv subgraphs now).
Run Time (ms): The execution time in milliseconds.
Perf(GOP/s): The DPU performance in unit of GOP per second.
Mem (MB): Total load/store size of this subgraph.
MB/s: Average DDR memory access bandwidth.
MB/s = (total load size of the subgraph (including feature map and weight/bias, from DDR/HBM to DPU bank mem) + total store size of the subgraph (from DPU bank mem to DDR/HBM)) / subgraph runtime

DPU Profiling Examples

You can find advanced DPU profiling examples with the Vitis AI Profiler on the Vitis AI Profiler GitHub page.