Clocking and Resets

Clocking

There are three clock domains in the DPU IP: the register configuration, the data controller, and the computation unit. The three input clocks can be configured depending on the requirements. Therefore, the corresponding reset for the three input clocks must be configured correctly.

Clock Domain

The following figure shows the three clock domains.

Figure 1: Clock Domains in the DPU

Register Clock

s_axi_aclk is used for the register configuration module. This module receives the DPU configuration though the S_AXI interface. The S_AXI clock can be configured as common with the M-AXI clock or as an independent clock. The DPU configuration registers are updated at a very low frequency and most of those registers are set at the start of a task. The M-AXI is used as a high-frequency clock, Xilinx recommends setting the S-AXI clock as an independent clock with a frequency of 100 MHz.

In the Vitis flow, the shell may provide only two clocks for the DPU IP. In this case, the S_AXI clock must be configured as common with the M-AXI clock.

Data Controller Clock

The primary function of the data controller module is to schedule the data flow in the DPU IP. The data controller module works with m_axi_dpu_aclk. The data transfer between the DPU and external memory happens in the data controller clock domain, so m_axi_dpu_aclk is also the AXI clock for the AXI_MM master interface in the DPU IP. m_axi_dpu_aclk should be connected to the AXI_MM master clock.

Computation Clock

The DSP slices in the computation unit module are in the dpu_2x_clk domain, which runs at twice the clock frequency of the data controller module. The two related clocks must be edge-aligned.

Reference Clock Generation

There are three input clocks for the DPU and the frequency of dpu_2x_clk should be twice that of m_axi_dpu_aclk. m_axi_dpu_aclk and dpu_2x_clk must be synchronous. The recommended circuit design is shown here.

Figure 2: Reference Circuit

An MMCM and two BUFGCE_DIV blocks can be instantiated to design this circuit. The frequency of clk_in1 is arbitrary and the frequency of output clock CLKOUT in the MMCM should be the frequency of dpu_clk_2x. BUFGCE_DIV_CLK1_INST divides the frequency of CLKOUT by two. dpu_clk and dpu_clk_2x are derived from the same clock, so they are synchronous. The two BUFGCE_DIVs reduce the skew between the two clocks, which helps with timing closure.

Configuring Clock Wizard

Instantiating the Xilinx clock wizard IP can implement the above circuit. In this reference design, the frequency of s_axi_aclk is set to 100 MHz and m_axi_dpu_aclk is set to 325 MHz. Therefore, the frequency of the dpu_2x_clk should be set to 650 MHz accordingly. The recommended configuration of the Clocking Options tab is shown in the following figure.
Note: The parameter of the Primitive must be set to Auto.
Figure 3: Recommended Clocking Options of Clock Wizard

In addition, Matched Routing must be selected for m_axi_dpu_aclk and dpu_2x_clk in the Output Clocks tab of the Clock Wizard IP. Matched Routing significantly reduces the skew between clocks generated through BUFGCE_DIV blocks. The related configuration is shown in the following figure.

Figure 4: Matched Routing in Clock Wizard
Note: Set the frequencies of the clkout from High to Low. Figure (a) shows the correct sequence. The settings in figure (a) achieved the dedicated clock design in the Summary page while the figure (b) did not. For more details, refer to the Clocking Wizard LogiCORE IP Product Guide (PG065).
Figure 4: Comparison of clkout Frequency Sequence

Adding CE for dpu_2x_clk

The dpu_2x clock gating option can reduce the power consumption of the DPU. When the option is enabled, the number of generated clk_dsp should be equal to the number of DPU cores. Each clk_dsp should be set as a buffer with CE in the clock wizard IP. As shown in the following figure, three clk_dsp_ce appear when the output clock is configured with the CE. To enable the dpu_2x clock gating function, each clk_dsp_ce port should be connected to the corresponding dpu_2x_clk_ce port in the DPU.

Figure 6: Configure Clock Wizard with Buffer CE

After configuring the clock wizard, the clock_dsp_ce should be connected to the corresponding port in the DPU. The connections are shown in the following figure.

Figure 7: Clock CE and DPU Connections

Reset

There are three input clocks for the DPU IP and each clock has a corresponding reset. Each reset must be synchronous to its corresponding clock. If the related clocks and resets are not synchronized, the DPU might not work properly. A Processor System Reset IP block is recommended to generate a synchronized reset signal. The reference design is shown here.

Figure 8: Reference Design for Resets