Analyzing the Results of Synthesis

After synthesis completes, Vitis HLS automatically creates synthesis reports to help you understand and analyze the performance of the implementation. Examples of these reports include the Synthesis Summary report, Schedule Viewer, Function Call Graph, and Dataflow Viewer. You can view these reports from the Flow Navigator in the Vitis HLS IDE.

Schedule Viewer
Shows each operation and control step of the function, and the clock cycle that it executes in.
Dataflow Viewer
Shows the dataflow structure inferred by the tool, inspect the channels (FIFO/PIPO), to let you examine the effect of channel depth on performance
Function Call Graph Viewer
Displays your full design after C Synthesis or C/RTL Co-simulation to show the throughput of the design in terms of latency and II.

In addition to the various graphs and viewers described above, the Vitis HLS tool provides additional views to expand on the information available for analysis of your design.

Module Hierarchy
Shows the resources and latency contribution for each block in the RTL hierarchy It also indicates any II or timing violations. In case of timing violations, the hierarchy window will also show the total negative slack observed in a specific module.
Performance Profile
Shows details on the performance of the block currently selected in the Module Hierarchy view. Performance is measured in terms of latency and the initiation interval, and includes details on whether the block was pipelined or not.
Resource Profile
Shows the resources used at the selected level of hierarchy, and shows the control state of the operations used.
Properties view
Shows the properties of the currently selected control step or operation in the Schedule Viewer.

Schedule Viewer

The Schedule Viewer provides a detailed view of the synthesized RTL, showing each operation and control step of the function, and the clock cycle that it executes in. It helps you to identify any loop dependencies that are preventing parallelism, timing violations, and data dependencies.

The Schedule Viewer is displayed by default in the Analysis perspective. You can open it from the Module Hierarchy window by right-clicking a module and selecting Open Schedule Viewer from the menu.

In the Schedule Viewer,

  • The left vertical axis shows the names of operations and loops in the RTL hierarchy. Operations are in topological order, implying that an operation on line n can only be driven by operations from a previous line, and will only drive an operation in a later line. Depending upon the type of violations found the Schedule Viewer shows additional information for each operation:
    • Resource limitation: displays the type of operation(read/write), type of memory used( RAM_1p or RAM_2p). In the image below the vecIn is a memory which is a dual port ram and trying to perform 3 reads in a single iteration. This causes an II violation because of a resource limitation and the tool is highlighting the operation which is scheduled in the next cycle of the load operation.
    • Dependency: displays information related to iterations which have a loop carried dependency. For example, a read transaction could have a dependency on a prior write value.
  • The top horizontal axis shows the clock cycles in consecutive order.
  • The vertical dashed line in each clock cycle shows the reserved portion of the clock period due to clock uncertainty. This time is left by the tool for the Vivado back-end processes, like place and route.
  • Each operation is shown as a gray box in the table. The box is horizontally sized according to the delay of the operation as percentage of the total clock cycle. In case of function calls, the provided cycle information is equivalent to the operation latency.
  • Multi-cycle operations are shown as gray boxes with a horizontal line through the center of the box.
  • The Schedule Viewer also displays general operator data dependencies as solid blue lines. As shown in the figure below, when selecting an operation you can see solid blue arrows highlighting the specific operator dependencies. This gives you the ability to perform detailed analysis of data dependencies. The green dotted line indicates an inter-iteration data dependency.
  • Memory dependencies are displayed using golden lines.
  • In addition, lines of source code are associated with each operation in the Schedule Viewer report. Right-click the operation to use the Goto Source command to open the input source code associated with the operation.

In the figure below, the loop called RD_Loop_Row is selected. This is a pipelined loop and the initiation interval (II) is explicitly stated in the loop bar. Any pipelined loop is visualized unfolded, meaning one full iteration is shown in the schedule viewer. Overlap, as defined by II, is marked by a thick clock boundary on the loop marker.

The total latency of a single iteration is equivalent to the number of cycles covered by the loop marker. In this case, it is three cycles.

Figure 1: Schedule Viewer

The Schedule Viewer displays a menu bar at the top right of the report that includes the following features:

  • A drop-down menu, initially labeled Focus Off, that lets you specify operations or events in the report to select.
  • A text search field to search for specific operations or steps (), and commands to Scroll Up or Scroll Down through the list of objects that match your search text
  • Zoom In, Zoom Out, and Zoom Fit commands ().
  • The Filter command () lets you dynamically filter the operations that are displayed in the viewer. You can filter operations by type, or by clustered operations.
    • Filtering by type allows you to limit what operations get presented based on their functionality. For example, visualizing only adders, multipliers, and function calls will remove all of the small operations such as “and” and “or”s.
    • Filtering by clusters exploits the fact that the scheduler is able to group basic operations and then schedule them as one component. The cluster filter setting can be enabled to color the clusters or even collapse them into one large operation in the viewer. This allows a more concise view of the schedule.
Figure 2: Operation Causing Violation

You can quickly locate II violations using the drop-down menu in the Schedule Viewer, as shown in the figure above. You can also select it through the context menu in the Module Hierarchy view.

To locate the operations causing the violation in the source code, right-click the operation and use the Goto Source command, or double-click the operation and the source viewer will appear and identify the root of the object in the source.

Timing violations can also be quickly found from the Module Hierarchy view context menu, or by using the drop-down menu in the Schedule Viewer menu. A timing violation is a path of operations requiring more time than the available clock cycle. To visualize this, the problematic operation is represented in the Schedule Viewer in a red box.

By default all dependencies (blue lines) are shown between each operation in the critical timing path.

Properties View

At the bottom of the Schedule Viewer, as shown in the top figure, is the Properties view that displays the properties of a currently selected object in the Schedule Viewer. This lets you see details of the specific function, loop, or operation that is selected in the Schedule Viewer. The types of elements that can be selected, and the properties displayed include:

  • Functions or Loops
    Initiation Interval (II)
    The number of clock cycles before the function or loop can accept new input data.
    Loop Iteration Latency
    The number of clock cycles it takes to complete one iteration of the loop.
    Latency
    The number of clock cycles required for the function to compute all output values, or for the loop to complete all iterations.
    Pipelined
    Indicates that the function or loop are pipelined in the RTL design.
    Slack
    The timing slack for the function or loop.
    Tripcount
    The number of iterations a loop completes.
    Resource Utilization
    Displays the number of BRAM, DSP, LUT, or FF used to implement the function or loop.
  • Operation and Storage Mapping
    Name
    Location which contains the code.
    Op Code
    Operation which has been scheduled, for example, add, sub, and mult. For more information, refer to the BIND_OP or BIND_STORAGE pragmas or directives.
    Op Latency
    Displays the default or specified latency for the binding of the operation or storage.
    Bitwidth
    Bitwidth of the Operation.
    Impl
    Defines the implementation used for the specified operation or storage.

Function Call Graph Viewer

The new Function Call Graph Viewer, which can be opened from the Flow Navigator, illustrates your full design after C Synthesis or C/RTL Co-simulation. The goal of this viewer is to show the throughput of the design in terms of latency and II. It helps identify the critical path in your design and helps you identify bottlenecks in the design to focus on to improve throughput. It can also show the paths through the design where throughput may be imbalanced leading to FIFO stalls and/or deadlock.

Figure 3: Performance Metrics Synthesis

In some cases, the displayed hierarchy of the design might not be the same as your source code as a result of HLS optimizations that convert loops into function pipelines, etc. Functions that are in-lined will no longer be visible in the call graph, as they are no longer separate functions in the synthesized code. If multiple instances of a function are created, each unique instance of the function is shown in the call graph. This lets you see what functions contribute to a calling function's latency and II.

The graph as shown above displays functions as rectangular boxes, and loops as oval boxes, each with II, latency, and resource or timing data depending on the specific view. Before C/RTL co-simulation is completed the performance and resource metrics that are shown in the graph are from the C Synthesis phase, and are therefore estimates from the HLS tool.

Note: For more accurate resource and timing estimates, logic synthesis or implementation can be performed as part of Exporting the RTL Design.

After co-simulation, actual II and latency numbers are reported along with stalling percentages, and this information is back annotated from data collected during co-simulation. You can toggle between the synthesis performance metrics and co-simulation metrics using the drop-down menu at the upper-left of the Function Call Graph viewer.

You can also use the Heat Map feature to highlight several metrics of interest:

  • II (min, max, avg)
  • Latency (min, max, avg)
  • Stalling Time Percentage
Figure 4: Performance Metrics

The heat map uses color coding to highlight problematic modules. Using a color scale of red to green where red indicates the high value of the metric (i.e. highest II or highest latency) while green indicates a low value of the metric in question. The colors that are neither red nor green represent the range of values that are in between the highest and lowest values. As shown above, this helps in quickly identifying the modules that need attention. In the example shown above, we are showing a heat map for LATENCY MAX and the path of red modules indicates where the high latency values are observed.

As mentioned before, the Function Call Graph illustrates at a high level, the throughput numbers of your design. The user can view the Function Call Graph as a cockpit from which further investigations can be carried out. Right-click on any of the displayed modules to display a menu of options that you can use to display additional information. This lets you see the overall design and then jump into specific parts of the design which need extra attention. Additional reports include the Schedule Viewer, Synthesis Summary report, Dataflow Viewer, and source files. The Function Call Graph is the one viewer in Vitis HLS where you can see the full picture of your design and have the latency and II information of each module available for analysis - this includes the dataflow modules for whom the performance information can only be obtained after co-simulation.

TIP: Additional performance and resource metrics are displayed for each function/loop in the Modules/Loops table under the report.

Dataflow Viewer

The DATAFLOW optimization is a dynamic optimization which can only be fully understood after the RTL co-simulation is complete. Due to this fact, the Dataflow viewer lets you see the dataflow structure inferred by the tool, inspect the channels (FIFO/PIPO), and examine the effect of channel depth on performance. Performance data is back-annotated to the Dataflow viewer from the co-simulation results.
IMPORTANT: You can open the Dataflow viewer without running RTL co-simulation, but your view will not contain important performance information such as read/write block times, co-sim depth, and stall times.

You must apply the DATAFLOW pragma or directive to your design for the Dataflow viewer to be populated. You can apply dataflow to the top-level function, or specify regions of a function, or loops. The Dataflow viewer displays a representation of the dataflow graph structure, showing the different processes and the underlying producer-consumer connections.

In the Module Hierarchy view, the icon beside the function indicates that a Dataflow Viewer report is available. When you see this icon, you can right-click the function and use the Open Dataflow Viewer command.

Figure 5: Dataflow Viewer

Features of the Dataflow viewer include the following:

  • Source Code browser.
  • Automatic cross-probing from process/channel to source code.
  • Filtering of ports and channel types.
  • Process and Channel table details the characteristics of the design:
    • Channel Profiling (FIFO sizes etc), enabled from Solution Settings dialog box.
    • Process Read Blocking/Write Blocking/Stalling Time reported after RTL co-simulation.
      IMPORTANT: You must use cosim_design -enable_dataflow_profiling to capture data for the Dataflow viewer, and your test bench must run at least two iterations of the top-level function.
    • Process Latency and II displayed.
    • Channel type and widths are displayed in the Channel table.
    • Automatic cross-probing from Process and Channel table to the Graph and Source browser.
    • Hover over channel or process to display tooltips with design information.

The Dataflow viewer can help with performance debugging your designs. When your design deadlocks during RTL co-simulation, the GUI will open the Dataflow viewer and highlight the channels and processes involved in the deadlock so you can determine if the cause is insufficient FIFO depth, for instance.

When your design does not perform as expected, the Process and Channels table can help you understand why. A process can stall waiting to read input, or can stall because it cannot write output. The channel table provides you with stalling percentages, as well as identifying if the process is "read blocked" or "write blocked."

TIP: If you use a Tcl script to create the Vitis HLS project, you can still open it in the GUI to analyze the design.