C/RTL Co-Simulation in Vitis HLS

If you added a C test bench to the project for simulation purposes, you can also use it for C/RTL co-simulation to verify that the RTL is functionally identical to the C source code. Select the Run Cosimulation command from the Flow Navigator to verify the RTL results of synthesis. The Co-simulation Dialog box is opened as shown in the following figure lets you select which type of RTL output to use for verification (Verilog or VHDL) and which HDL simulator to use for the simulation.

Figure 1: Co-Simulation Dialog Box

The dialog box features the following settings:

Simulator
Choose from one of the supported HDL simulators in the Vivado Design Suite. Vivado simulator is the default simulator.
Language
Specify the use of Verilog or VHDL as the output language for simulation.
Setup Only
Create the required simulation files, but do not run the simulation. The simulation executable can be run from a command shell at a later time.
Optimizing Compile
Enable optimization to improve the runtime performance, if possible, at the expense of compilation time.
Input Arguments
Specify any command-line arguments to the C test bench.
Dump Trace
Specifies the level of trace file output written to the sim/Verilog or sim/VHDL directory of the current solution when the simulation executes. Options include:
all
Output all port and signal waveform data being saved to the trace file.
port
Output waveform trace data for the top-level ports only.
none
Do not output trace data.
Random Stall
Applies a randomized stall for each data transmission.
Compiled Library Location
Specifies the directory for the compiled simulation library to use with third-party simulators.
Extra Options for DATAFLOW
Wave Debug
Enables waveform visualization of all processes in the RTL simulation. This option is only supported when using Vivado logic simulator. Enabling this will launch the Simulator GUI to let you examine dataflow activity in the waveforms generated by simulation. Refer to the Vivado Design Suite User Guide: Logic Simulation (UG900) for more information on that tool.
Disable Deadlock Detection
Disables deadlock detection, and opening the Cosim Deadlock Viewer in co-simulation.
Channel (PIPO/FIFO) Profiling
Enables capturing profile data for display in the Dataflow Viewer.
Dynamic Deadlock Prevention
Prevent deadlocks by enabling automatic FIFO channel size tuning for dataflow profiling during co-simulation.
TIP: You can pre-configure C/RTL Co-Simulation by right-clicking a solution in the Explorer view and selecting the Solutions Settings command to open the Solution Settings dialog box, and editing the Co-simulation settings. The settings are the same as described above, but can be configured prior to running the simulation.

After the C/RTL co-simulation completes, the console displays the following messages to confirm the verification was successful:

INFO: [Common 17-206] Exiting xsim ...
INFO: [COSIM 212-316] Starting C post checking ...
...
Test passed !
INFO: [COSIM 212-1000] *** C/RTL co-simulation finished: PASS ***

Finished C/RTL cosimulation.

Any printf commands in the C test bench are also echoed to the console during simulation.

As described in Writing a Test Bench, the test bench verifies output from the top-level function for synthesis, and returns zero to the main() function of the test bench if the output is correct. Vitis HLS uses the same return value for both C simulation and C/RTL co-simulation to determine if the results are correct. If the C test bench returns a non-zero value, Vitis HLS reports that the simulation failed.

The Vitis HLS GUI automatically switches to the Analysis perspective after simulation and opens the Cosimulation Report showing the pass or fail status and the measured statistics on latency and II. Any additional reports that are generated, such as the Dataflow report, are also opened in the Analysis perspective.

Figure 2: Cosimulation Report

The Cosimulation Report displays the full design hierarchy, and if Channel (PIPO/FIFO) Profiling is enabled, you will be able to see details of the dataflow regions as well.

IMPORTANT: II is marked as NA in the Cosimulation Report unless the transaction number in the RTL simulation is greater than 1. If you want to calculate II, you must ensure there are at least two transactions in the RTL simulation as described in Writing a Test Bench.

Output of C/RTL Co-Simulation

When C/RTL Cosimulation completes, the sim folder is created inside the solution folder. This folder contains the following elements:
  • The sim/report folder contains the report and log file for each type of RTL simulated.
  • A verification folder named sim/verilog or vhdl is created for each RTL language that is verified.
    • The RTL files used for simulation are stored in the verilog or vhdl folder.
    • The RTL simulation is executed in the verification folder.
    • Any outputs, such as trace files and waveform files, are written to the verilog or vhdl folder.
  • Additional folders sim/autowrap, tv, wrap and wrap_pc are work folders used by Vitis HLS. There are no user files in these folders.
TIP: If the Setup Only option was selected in the C/RTL Co-Simulation dialog box, an executable is created in the verification folder but the simulation is not run. The simulation can be manually run by executing the simulation .exe at the command prompt.

Automatically Verifying the RTL

Figure 3: C/RTL Verification Flow

C/RTL co-simulation uses a C test bench, running the main() function, to automatically verify the RTL design running in behavioral simulation. The C/RTL verification process consists of three phases:

  1. The C simulation is executed and the inputs to the top-level function, or the Design-Under-Test (DUT), are saved as “input vectors.”
  2. The “input vectors” are used in an RTL simulation using the RTL created by Vitis HLS in Vivado simulator, or a supported third-party HDL simulator. The outputs from the RTL, or results of simulation, are saved as “output vectors.”
  3. The “output vectors” from the RTL simulation are returned to the main() function of the C test bench to verify the results are correct. The C test bench performs verification of the results, in some cases by comparing to known good results.

The following messages are output by Vitis HLS as verification progresses:

While running C simulation:

INFO: [COSIM 212-14] Instrumenting C test bench ...
   Build using ".../bin/g++"
   Compiling dct_test.cpp_pre.cpp.tb.cpp
   Compiling dct_inline.cpp_pre.cpp.tb.cpp
   Compiling apatb_dct.cpp
   Generating cosim.tv.exe
INFO: [COSIM 212-302] Starting C TB testing ... 
Test passed !

At this stage, because the C simulation was executed, any messages written by the C test bench will be output to the Console window and log file.

While running RTL simulation:

INFO: [COSIM 212-333] Generating C post check test bench ...
INFO: [COSIM 212-12] Generating RTL test bench ...
INFO: [COSIM 212-1] *** C/RTL co-simulation file generation completed. ***
INFO: [COSIM 212-323] Starting verilog/vhdl simulation. 
INFO: [COSIM 212-15] Starting XSIM ...

At this stage, any messages from the RTL simulation are output in console window or log file.

While checking results back in the C test bench:

INFO: [COSIM 212-316] Starting C post checking ...
Test passed !
INFO: [COSIM 212-1000] *** C/RTL co-simulation finished: PASS ***

The following are requirements of C/RTL co-simulation:

Interface Synthesis Requirements

To use the C/RTL co-simulation feature to verify the RTL design, at least one of the following conditions must be true:

  • Top-level function must be synthesized using an ap_ctrl_chain or ap_ctrl_hs block-level protocol
  • Design must be purely combinational
  • Top-level function must have an initiation interval of 1
  • Interfaces must be all arrays that are streaming and implemented with axis or ap_hs interface modes
    Note: The hls::stream variables are automatically implemented as ap_fifo interfaces.

If at least one of these conditions is not met, C/RTL co-simulation halts with the following message:

@E [SIM-345] Cosim only supports the following 'ap_ctrl_none' designs: (1) 
combinational designs; (2) pipelined design with task interval of 1; (3) designs with 
array streaming or hls_stream ports.
@E [SIM-4] *** C/RTL co-simulation finished: FAIL ***
IMPORTANT: If the design is specified to use the block-level IO protocol ap_ctrl_none and the design contains any hls::stream variables which employ non-blocking behavior, C/RTL co-simulation is not guaranteed to complete.

If any top-level function argument is specified as an AXI4-Lite interface, the function return must also be specified as an AXI4-Lite interface.

Verification of DATAFLOW and DEPENDENCE

C/RTL co-simulation automatically verifies aspects of the DATAFLOW and DEPENDENCE directives.

If the DATAFLOW directive is used to pipeline tasks, it inserts channels between the tasks to facilitate the flow of data between them. It is typical for the channels to be implemented with FIFOs and the FIFO depth specified using the STREAM directive, or the config_dataflow command. If a FIFO depth is too small, the RTL simulation can stall. For example, if a FIFO is specified with a depth of 2 but the producer task writes three values before any data values are read by the consumer task, the FIFO blocks the producer. In some conditions this can cause the entire design to stall as described in Cosim Deadlock Viewer.

In this case, C/RTL co-simulation issues a message as shown below, indicating the channel in the DATAFLOW region is causing the RTL simulation to stall.


//////////////////////////////////////////////////////////////////////////////
// ERROR!!! DEADLOCK DETECTED at 1292000 ns! SIMULATION WILL BE STOPPED! //
//////////////////////////////////////////////////////////////////////////////
/////////////////////////
// Dependence cycle 1:
// (1): Process: hls_fft_1kxburst.fft_rank_rad2_nr_man_9_U0
//      Channel: hls_fft_1kxburst.stage_chan_in1_0_V_s_U, FULL
//      Channel: hls_fft_1kxburst.stage_chan_in1_1_V_s_U, FULL
//      Channel: hls_fft_1kxburst.stage_chan_in1_0_V_1_U, FULL
//      Channel: hls_fft_1kxburst.stage_chan_in1_1_V_1_U, FULL
// (2): Process: hls_fft_1kxburst.fft_rank_rad2_nr_man_6_U0
//      Channel: hls_fft_1kxburst.stage_chan_in1_2_V_s_U, EMPTY
//      Channel: hls_fft_1kxburst.stage_chan_in1_2_V_1_U, EMPTY
/////////////////////////////////
// Total 1 cycles detected!
/////////////////////////////////////////////////////////////

If co-simulation is attempted from the Vitis HLS IDE and the simulation results in a deadlock, the Vitis HLS IDE will automatically launch the Dataflow Viewer and show the processes involved in the deadlock (displayed in red). It will also show which channels are full (in red) versus empty (in white). In this case, review the implementation of the channels between the tasks and ensure any FIFOs are large enough to hold the data being generated.

In a similar manner, the RTL test bench is also configured to automatically check the validity of false dependencies specified using the DEPENDENCE directive. A warning message during co-simulation indicates the dependency is not false, and the corresponding directive must be removed to achieve a functionally valid design.

TIP: The -disable_deadlock_detection option of the cosim_design command disables these checks.

Unsupported Optimizations for Co-Simulation

For Vivado IP mode, automatic RTL verification does not support cases where multiple transformations are performed on arrays on the interface, or arrays within structs.

IMPORTANT: This feature is not supported for the Vitis kernel flow.

In order for automatic verification to be performed, arrays on the function interface, or array inside structs on the function interface, can use any of the following optimizations, but not two or more:

  • Vertical mapping on arrays of the same size
  • Reshape
  • Partition, for dimension 1 of the array

Automatic RTL verification does not support any of the following optimizations used on a top-level function interface:

  • Horizontal mapping.
  • Vertical mapping of arrays of different sizes.
  • Conditional access on the AXI4-Stream with register slice enabled.
  • Mapping arrays to streams.

Simulating IP Cores

When the design is implemented with floating-point cores, bit-accurate models of the floating-point cores must be made available to the RTL simulator. This is automatically accomplished if the RTL simulation is performed using the Vivado logic simulator. However, for supported third-party HDL simulators, the Xilinx floating-point library must be pre-compiled and added to the simulator libraries.

For example, to compile the Xilinx floating-point library in Verilog for use with the VCS simulator, open the Vivado IDE and enter the following command in the Tcl Console window:

compile_simlib -simulator vcs_mx -family all -language verilog

This creates the floating-point library in the current directory for VCS. See the Vivado Tcl Console window for the directory name. In this example, it is ./rev3_1.

You must refer to this library from within theVitis HLS IDE by specifying the Compiled Library Location field in the Co-simulation dialog box as described in C/RTL Co-Simulation in Vitis HLS, or by running C/RTL co-simulation using the following command:

cosim_design -tool vcs -compiled_library_dir <path_to_library>/rev3_1

Analyzing RTL Simulations

When the C/RTL co-simulation completes, the simulation report opens and shows the measured latency and II. These results may differ from values reported after HLS synthesis, which are based on the absolute shortest and longest paths through the design. The results provided after C/RTL co-simulation show the actual values of latency and II for the given simulation data set (and may change if different input stimuli is used).

In non-pipelined designs, C/RTL co-simulation measures latency between ap_start and ap_done signals. The II is 1 more than the latency, because the design reads new inputs 1 cycle after all operations are complete. The design only starts the next transaction after the current transaction is complete.

In pipelined designs, the design might read new inputs before the first transaction completes, and there might be multiple ap_start and ap_ready signals before a transaction completes. In this case, C/RTL co-simulation measures the latency as the number of cycles between data input values and data output values. The II is the number of cycles between ap_ready signals, which the design uses to requests new inputs.

Note: For pipelined designs, the II value for C/RTL co-simulation is only determined if the design is simulated for multiple transactions.

Viewing Simulation Waveforms

To view waveform data during RTL co-simulation, you must enable the following in the Co-simulation Dialog box:

  • Select Vivado XSIM as the RTL simulator.
  • Enable Dump Trace with either the port or all options.

Vivado simulator GUI opens and displays all the processes in the RTL design. Visualizing the active processes within the HLS design allows detailed profiling of process activity and duration within each activation of the top module. The visualization helps you to analyze individual process performance, as well as the overall concurrent execution of independent processes. Processes dominating the overall execution have the highest potential to improve performance, provided process execution time can be reduced.

This visualization is divided into two sections:

  • HLS process summary contains a hierarchical representation of the activity report for all processes.
    DUT name
    <name>
    Function
    <function name>
  • Dataflow analysis provides detailed activity information about the tasks inside the dataflow region.
    DUT name
    <name>
    Function
    <function name>
    Dataflow/Pipeline Activity
    Shows the number of parallel executions of the function when implemented as a dataflow process.
    Active Iterations
    Shows the currently active iterations of the dataflow. The number of rows is dynamically incremented to accommodate for the visualization of any concurrent execution.
    StallNoContinue
    A stall signal that tells if there were any output stalls experienced by the dataflow processes (the function is done, but it has not received a continue from the adjacent dataflow process).
    RTL Signals
    The underlying RTL control signals that interpret the transaction view of the dataflow process.
Figure 4: Waveform Viewer

After C/RTL co-simulation completes, you can reopen the RTL waveforms in the Vivado IDE by clicking the Open Wave Viewer toolbar button, or selecting Solution > Open Wave Viewer.

IMPORTANT: When you open the Vivado IDE using this method, you can only use the waveform analysis features, such as zoom, pan, and waveform radix.

Cosim Deadlock Viewer

A deadlock is a situation in which processes inside a DATAFLOW region share the same channels, effectively preventing each other from writing or reading from it, resulting in both processes getting stuck. This scenario is common when there are either FIFO’s or a mix of PIPOs and FIFOs as channels inside the DATAFLOW.

The deadlock viewer visualizes this deadlock scenario on the static dataflow viewer. It highlights the problematic processes and channels. The viewer also provides a cross-probing capability to link between the problematic dataflow channels and the associated source code. The user can use the information in solving the issue with less time and effort. The viewer automatically opens only after, the co-simulation detects the deadlock situation and the co-sim run has finished.

A small example is shown below. The dataflow region consists of two processes which are communicating through PIPO and FIFO. The first loop in proc_1 writes 10 data items in data_channel1, before writing anything in data_array. Because of the insufficient FIFO depth the data_channel loop does not complete which blocks the rest of the process. Then proc_2 blocks because it cannot read the data from data_channel2 (because it is empty), and cannot remove data from data_channel1. This creates a deadlock that requires increasing the size of data_channel1 to at least 10.

void example(hls::stream<data_t>& A, hls::stream<data_t>& B){
#pragma HLS dataflow
..
..
hls::stream<int> data_channel;
int data_array[10];
#pragma HLS STREAM variable=data_channel depth=8 dim=1
    proc_1(A, data_channel, data_array);
    proc_2(B, data_channel, data_array);
}

void proc_1(hls::stream<data_t>& A, hls::stream<int>& data_channel, int data_array[10]){
  …
  for(i = 0; i < 10; i++){
    tmp = A.read();
    tmp.data = tmp.data.to_int();
    data_channel.write(tmp.data);
  }
  for(i = 0; i < 10; i++){
      data_array[i] = i + tmp.data.to_int();
  }
}

void proc_2(hls::stream<data_t>& B, hls::stream<int>& data_channel, int data_array[10]){
  int i;
  ..
  ..
  for(i = 0; i < 10; i++){
      if (i == 0){
        tmp.data = data_channel.read() + data_array[5];
      }
      else {
        tmp.data = data_channel.read();
      }
    B.write(tmp);
  }
Co-sim Log:
///////////////////////////////////////////////////////////////////////////////////
// Inter-Transaction Progress: Completed Transaction / Total Transaction
// Intra-Transaction Progress: Measured Latency / Latency Estimation * 100%
//
// RTL Simulation : "Inter-Transaction Progress" ["Intra-Transaction Progress"] @ "Simulation Time"
////////////////////////////////////////////////////////////////////////////////////
// RTL Simulation : 0 / 1 [0.00%] @ "105000"
//////////////////////////////////////////////////////////////////////////////
// ERROR!!! DEADLOCK DETECTED at 132000 ns! SIMULATION WILL BE STOPPED! //
//////////////////////////////////////////////////////////////////////////////
/////////////////////////
// Dependence cycle 1:
// (1): Process: example_example.proc_1_U0
//      Channel: example_example.data_channel1_U, FULL
// (2): Process: example_example.proc_2_U0
//      Channel: example_example.data_array_U, EMPTY
////////////////////////////////////////////////////////////////////////
// Totally 1 cycles detected!
////////////////////////////////////////////////////////////////////////
Figure 5: Deadlock Viewer

Debugging C/RTL Co-Simulation

When C/RTL co-simulation completes, Vitis HLS typically indicates that the simulations passed and the functionality of the RTL design matches the initial C code. When the C/RTL co-simulation fails, Vitis HLS issues the following message:

@E [SIM-4] *** C/RTL co-simulation finished: FAIL ***

Following are the primary reasons for a C/RTL co-simulation failure:

  • Incorrect environment setup
  • Unsupported or incorrectly applied optimization directives
  • Issues with the C test bench or the C source code

To debug a C/RTL co-simulation failure, run the checks described in the following sections. If you are unable to resolve the C/RTL co-simulation failure, see Xilinx Support for support resources, such as answers, documentation, downloads, and forums.

Setting Up the Environment

Check the environment setup as shown in the following table.

Table 1. Debugging Environment Setup
Questions Actions to Take
Are you using a third-party simulator?

Ensure the path to the simulator executable is specified in the system search path.

When using the Vivado simulator, you do not need to specify a search path.

Ensure that you have compiled the simulation libraries as discussed in Simulating IP Cores.

Are you running Linux? Ensure that your setup files (for example .cshrc or .bashrc) do not have a change directory command. When C/RTL co-simulation starts, it spawns a new shell process. If there is a cd command in your setup files, it causes the shell to run in a different location and eventually C/RTL co-simulation fails.

Optimization Directives

Check the optimization directives as shown in the following table.

Table 2. Debugging Optimization Directives
Questions Actions to Take
Are you using the DEPENDENCE directive? Remove the DEPENDENCE directives from the design to see if C/RTL co-simulation passes.

If co-simulation passes, it likely indicates that the TRUE or FALSE setting for the DEPENDENCE directive is incorrect as discussed in Verification of DATAFLOW and DEPENDENCE.

Does the design use volatile pointers on the top-level interface? Ensure the DEPTH option is specified on the INTERFACE directive.

When volatile pointers are used on the interface, you must specify the number of reads/writes performed on the port in each transaction or each execution of the C function.

Are you using FIFOs with the DATAFLOW optimization? Check to see if C/RTL co-simulation passes with the standard ping-pong buffers.

Check to see if C/RTL co-simulation passes without specifying the size for the FIFO channels. This ensures that the channel defaults to the size of the array in the C code.

Reduce the size of the FIFO channels until C/RTL co-simulation stalls. Stalling indicates a channel size that is too small. Review your design to determine the optimal size for the FIFOs. You can use the STREAM directive to specify the size of individual FIFOs.

Are you using supported interfaces? Ensure you are using supported interface modes. For details, see Interface Synthesis Requirements.
Are you applying multiple optimization directives to arrays on the interface? Ensure you are using optimizations that are designed to work together. For details, see Unsupported Optimizations for Co-Simulation.
Are you using arrays on the interface that are mapped to streams? To use interface-level streaming (the top-level function of the DUT), use hls::stream.

C Test Bench and C Source Code

Check the C test bench and C source code as shown in the following table.

Table 3. Debugging the C Test Bench and C Source Code
Questions Actions to Take
Does the C test bench check the results and return the value 0 (zero) if the results are correct? Ensure the C test bench returns the value 0 for C/RTL co-simulation. Even if the results are correct, the C/RTL co-simulation feature reports a failure if the C test bench fails to return the value 0.
Is the C test bench creating input data based on a random number? Change the test bench to use a fixed seed for any random number generation. If the seed for random number generation is based on a variable, such as a time-based seed, the data used for simulation is different each time the test bench is executed, and the results can vary.
Are you using pointers on the top-level interface that are accessed multiple times? Use a volatile pointer for any pointer that is accessed multiple times within a single transaction (one execution of the C function). If you do not use a volatile pointer, everything except the first read and last write is optimized out to adhere to the C standard.
Does the C code contain undefined values or perform out-of-bounds array accesses?

Confirm all arrays are correctly sized to match all accesses. Loop bounds that exceed the size of the array are a common source of issues (for example, N accesses for an array sized at N-1).

Confirm that the results of the C simulation are as expected and that output values were not assigned random data values.

Consider using the industry-standard Valgrind application outside of the HLS design environment to confirm that the C code does not have undefined or out-of-bounds issues.

It is possible for a C function to execute and complete even if some variables are undefined or are out-of-bounds. In the C simulation, undefined values are assigned a random number. In the RTL simulation, undefined values are assigned an unknown or X value.

Are you using floating-point math operations in the design?

Check that the C test bench results are within an acceptable error range instead of performing an exact comparison. For some of the floating point math operations, the RTL implementation is not identical to the C. For details, see Verification and Math Functions.

Ensure that the RTL simulation models for the floating-point cores are provided to the third-party simulator. For details, see Simulating IP Cores.

Are you using Xilinx IP blocks and a third-party simulator? Ensure that the path to the Xilinx IP simulation models is provided to the third-party simulator.
Are you using the hls::stream construct in the design that changes the data rate (for example, decimation or interpolation)?

Analyze the design and use the STREAM directive to increase the size of the FIFOs used to implement the hls::stream.

By default, an hls::stream is implemented as a FIFO with a depth of 2. If the design results in an increase in the data rate (for example, an interpolation operation), a default FIFO size of 2 might be too small and cause the C/RTL co-simulation to stall.

Are you using very large data sets in the simulation?

Use the reduce_diskspace option when executing C/RTL co-simulation. In this mode, HLS only executes 1 transaction at a time. The simulation might run marginally slower, but this limits storage and system capacity issues.

The C/RTL co-simulation feature verifies all transaction at one time. If the top-level function is called multiple times (for example, to simulate multiple frames of video), the data for the entire simulation input and output is stored on disk. Depending on the machine setup and OS, this might cause performance or execution issues.