Profiling the Model
Vitis AI Profiler
The Vitis™ AI profiler is a set of tools that helps profile and visualize AI applications based on VART:
- Easy to use as it neither requires any change in the user code nor any re-compilation of the program.
- Visualize system performance bottlenecks.
- Illustrate the execution state of different compute units (CPU/DPU).
The Vitis AI Profiler is an all-in-one profiling solution for Vitis AI. It is an application level tool to profile and visualize AI applications based on VART. For an AI application, there are components that run on the hardware, for example, neural network computation usually runs on the DPU, and there are components that run on a CPU as a function that is implemented by C/C++ code-like image pre-processing. This tool helps you to put the running status of all these different components together.
Vitis AI Profiler Architecture
The Vitis AI Profiler architecture is shown in the following figure:
Vitis AI Profiler GUI Overview
- DPU Summary
- A table of the number of runs and minimum/average/maximum times (ms) for each
kernel.
- DPU Throughput and DDR Transfer Rates
- Line graphs of achieved FPS and read/write transfer rates (in MB/s) as sampled
during the application.
- Timeline Trace
- This includes timed events from VART, HAL APIs, and the DPUs.
Getting Started with the Vitis AI Profiler
System Requirements
- Hardware
-
- Supports Zynq®
UltraScale+™ MPSoC
(DPUCZDX8G)
Supports Versal™ ACAP (DPUCVDX8G/ DPUCVDX8H)
Supports Alveo™ Data Center accelerator cards (DPUCAHX8H/ DPUCAHX8L)
- Supports Zynq®
UltraScale+™ MPSoC
(DPUCZDX8G)
- Software
-
- Supports VART v1.2+
Installing the Vitis AI Profiler
- Prepare the debug environment for vaitrace in the Zynq UltraScale+ MPSoC PetaLinux
platform.
- Configure and build PetaLinux by running
petalinux-config -c kernel
. - Enable the following settings for the Linux kernel.
- General architecture-dependent options ---> [*] Kprobes
- Kernel hacking ---> [*] Tracers
- Kernel hacking ---> [*] Tracers --->
[*] Kernel Function Tracer
[*] Enable kprobes-based dynamic events
[*] Enable uprobes-based dynamic events
- Run
petalinux-config -c rootfs
and enable the following setting for root-fs.user-packages ---> modules ---> [*] packagegroup-petalinux-self-hosted
- Run
petalinux-build
.
- Configure and build PetaLinux by running
- Install vaitrace. vaitrace is integrated into the VART runtime. If VART runtime is installed, vaitrace will be installed into /usr/bin/vaitrace.
Starting a Simple Trace with vaitrace
- Download and set up Vitis AI.
- Start testing and tracing.
- For C++ programs, add
vaitrace
in front of the test command as follows:# cd ~/Vitis_AI/examples/VART/samples/resnet50 # vaitrace ./resnet50 /usr/share/vitis_ai_library/models/resnet50/resnet50.xmodel
- For Python programs, add
-m vaitrace_py
to the Python interpreter command as follows:# cd ~/Vitis_AI/examples/VART/samples/resnet50_mt_py # python3 -m vaitrace_py ./resnet50.py 2 /usr/share/vitis_ai_library/models/resnet50/resnet50.xmodel
vaitrace and XRT generates some files in the working directory.
- For C++ programs, add
- Copy all .csv files and xclbin.ex.run_summary to your system. You can open the xclbin.ex.run_summary using vitis_analyzer 2020.2
and above:
- If using the command line, run
# vitis_analyzer xclbin.ex.run_summary
. - If using the GUI, select .
- If using the command line, run
To know more about the Vitis Analyzer, see Using the Vitis Analyzer.
VAI Trace Usage
Command Line Usage
# vaitrace --help
usage: Xilinx Vitis AI Trace [-h] [-c [CONFIG]] [-d] [-o [TRACESAVETO]] [-t [TIMEOUT]] [-v]
cmd Command to be traced
-b Bypass mode, just run command and by pass vaitrace, for debug use
-c [CONFIG] Specify the configuration file
-o [TRACESAVETO] Save trace file to
-t [TIMEOUT] Tracing time limitation, default value is 30 for vitis analyzer format, and 5 for .xat format
--txt Display txt summary
--fine_grained Fine-grained mode
Following are some important and frequently-used arguments:
- cmd
- cmd is your executable program of Vitis AI that wants to be traced.
- -t
- Controlling the tracing time (in seconds) starting from the [cmd] being
launched, the default value is 30. In other words, if no -t is specified for
vaitrace, the tracing will stop after [cmd] running for 30 seconds. The
[cmd] will continue to run as normal, but it will stop collecting tracing
data. Note: Trace about 50~100 images at once because less than 50 may not be enough for some statistic information and more than 100 will slow down the system significantly.
- -c
- You can start a tracing with more custom options by writing these options on a JSON configuration file and specify the configuration by -c. Details of configuration file will be explained in the next section.
- -o
- Location of the report. This is only available for the text summary mode. By default, the test summary will output to STDOUT.
- --txt
- Output text summary. vaitrace does not generate a report for the Vitis Analyzer in this mode.
- --fine_grained
- Start trace in the fine grained mode. This mode generates a mass of trace data and the trace time is limited to 10 seconds.
Others arguments are used for debugging.
Configuration
It is recommended to use a configuration file to record trace options for vaitrace. You can start a trace with configuration by using vaitrace -c trace_cfg.json.
Configuration priority:
.Here is an example of vaitrace configuration file.
"trace": {
"enable_trace_list": ["vitis-ai-library", "vart", "custom"],
"trace_custom": []
}
}
Key Name | Value Type | Description | |
---|---|---|---|
options | object | Vaitrace options | |
trace | object | ||
enable_trace_list | list | Built-in trace function list to be enabled, available value "vitis-ai-library", "vart", “opencv”, "custom", custom for function in trace_custom list | |
trace_custom | list | The list of functions to be traced that are implemented by user. For the name of function, naming space are supported. You can see an example of using custom trace function later in this document |
Text Summary
When the –txt
option is used, vaitrace prints an ASCII table as
shown in the following figure:
The fields are defined in the following list:
- DPU
- Name of the DPU.
- Batch
- Batch size of the DPU.
- SubGraph
- Name of subgraph in the xmodel.
- Workload
- Computation workload (MAC indicates two operations, only available for conv subgraphs now).
- Run Time (ms)
- The execution time in milliseconds.
- Perf(GOP/s)
- The DPU performance in unit of GOP per second.
- Mem (MB)
- Total load/store size of this subgraph.
- MB/s
- Average DDR memory access bandwidth.
MB/s = (total load size of the subgraph (including feature map and weight/bias, from DDR/HBM to DPU bank mem) + total store size of the subgraph (from DPU bank mem to DDR/HBM)) / subgraph runtime
DPU Profiling Examples
You can find advanced DPU profiling examples with the Vitis AI Profiler on the Vitis AI Profiler GitHub page.