Product Specification
Hardware Architecture
The detailed hardware architecture of the DPU is shown in the following figure. After start-up, the DPU fetches instructions from the off-chip memory to control the operation of the computing engine. The instructions are generated by the Vitis™ AI compiler, where substantial optimizations are performed.
On-chip memory is used to buffer input, intermediate, and output data to achieve high throughput and efficiency. The data is reused as much as possible to reduce the external memory bandwidth. A deep pipelined design is used for the computing engine. The processing elements (PE) take full advantage of the fine-grained building blocks such as multipliers, adders, and accumulators in Xilinx devices.
DPU with Enhanced Usage of DSP
A DSP Double Data Rate (DDR) technique is used to improve the performance achieved with the device. Therefore, two input clocks for the DPU are needed: One for general logic and another at twice the frequency for DSP slices. The difference between a DPU not using the DSP DDR technique and a DPU enhanced usage architecture is shown here.
Port Descriptions
The DPU top-level interfaces are shown in the following figure.
The DPU I/O signals are listed and described in the table below.
Signal Name | Interface Type | Width | I/O | Description |
---|---|---|---|---|
S_AXI | Memory mapped AXI slave interface | 32 | I/O | 32-bit memory mapped AXI interface for registers. |
s_axi_aclk | Clock | 1 | I | AXI clock input for S_AXI |
s_axi_aresetn | Reset | 1 | I | Active-Low reset for S_AXI |
dpu_2x_clk | Clock | 1 | I | Input clock used for DSP blocks in the DPU. The frequency is twice that of m_axi_dpu_aclk. |
dpu_2x_resetn | Reset | 1 | I | Active-Low reset for DSP blocks |
m_axi_dpu_aclk | Clock | 1 | I | Input clock used for DPU general logic. |
m_axi_dpu_aresetn | Reset | 1 | I | Active-Low reset for DPU general logic |
DPUx_M_AXI_INSTR | Memory mapped AXI master interface | 32 | I/O | 32-bit memory mapped AXI interface for DPU instructions. |
DPUx_M_AXI_DATA0 | Memory mapped AXI master interface | 128 | I/O | 128-bit for Zynq UltraScale+ MPSoC series. |
DPUx_M_AXI_DATA1 | Memory mapped AXI master interface | 128 | I/O | 128-bit for Zynq UltraScale+ MPSoC series. |
dpu_interrupt | Interrupt | 1~4 | O | Active-High interrupt output from DPU. The data width is determined by the number of DPU cores. |
SFM_M_AXI (optional) | Memory mapped AXI master interface | 128 | I/O | 128-bit memory mapped AXI interface for softmax data. |
sfm_interrupt (optional) | Interrupt | 1 | O | Active-High interrupt output from softmax module. |
dpu_2x_clk_ce (optional) | Clock enable | 1 | O | Clock enable signal for controlling the input DPU 2x clock when DPU 2x clock gating is enabled. |
|
Register Space
The DPU IP implements registers in programmable logic. The following tables show the DPU IP registers. These registers are accessible from the APU through the S_AXI interface.
reg_dpu_reset
The reg_dpu_reset register controls the resets of all DPU cores integrated in the DPU IP. The lower four bits of this register control the reset of up to four DPU cores. All the reset signals are active-High. The details of reg_dpu_reset are shown in the following table.
Register | Address Offset | Width | Type | Description |
---|---|---|---|---|
reg_dpu_reset | 0x004 | 32 | R/W | [n] – DPU core n reset |
reg_dpu_isr
The reg_dpu_isr register represents the interrupt status of all cores in the DPU IP. The lower four bits of this register shows the interrupt status of up to four DPU cores. The details of reg_dpu_irq are shown in the following table.
Register | Address Offset | Width | Type | Description |
---|---|---|---|---|
reg_dpu_isr | 0x608 | 32 | R | [n] – DPU core n interrupt status |
reg_dpu_start
The reg_dpu_start register is the start signal for a DPU core. There is one start register for each DPU core. The details of reg_dpu_start are shown in the following table.
Register | Address Offset | Width | Type | Description |
---|---|---|---|---|
reg_dpu0_start | 0x220 | 32 | R/W | DPU core0 start signal. |
reg_dpu1_start | 0x320 | 32 | R/W | DPU core1 start signal. |
reg_dpu2_start | 0x420 | 32 | R/W | DPU core2 start signal. |
reg_dpu3_start | 0x520 | 32 | R/W | DPU core3 start signal. |
reg_dpu_instr_addr
The reg_dpu_instr_addr register is used to indicate the instruction address of a DPU core. Each DPU core has a reg_dpu_instr_addr register. Only the lower 28-bits are valid. In the DPU processor, the real instruction-fetch address is a 40-bit signal which consists of the lower 28 bits of reg_dpu_instr_addr followed by 12 zero bits. The available instruction address for DPU ranges from 0x1000 to 0xFFFF_FFFF_FFFF_F000. The details of reg_dpu_instr_addr are shown in the following table.
Register | Address Offset | Width | Type | Description |
---|---|---|---|---|
reg_dpu0_instr_addr | 0x20C | 32 | R/W | Start address in external memory for DPU core0 instructions. The lower 28-bit is valid. |
reg_dpu1_instr_addr | 0x30C | 32 | R/W | Start address in external memory for DPU core1 instructions. The lower 28-bit is valid. |
reg_dpu2_instr_addr | 0x40C | 32 | R/W | Start address in external memory for DPU core2 instructions. The lower 28-bit is valid. |
reg_dpu3_instr_addr | 0x50C | 32 | R/W | Start address in external memory for DPU core3 instructions. The lower 28-bit is valid. |
reg_dpu_base_addr
The reg_dpu_base_addr register is used to indicate the address of input image and parameters for each DPU in external memory. The width of a DPU base address is 40 bits so it can support an address space up to 1 TB. All registers are 32 bits wide, so two registers are required to represent a 40-bit wide base address. reg_dpu0_base_addr0_l represents the lower 32 bits of base_address0 in DPU core0 and reg_dpu0_base_addr0_h represents the upper eight bits of base_address0 in DPU core0.
There are eight groups of DPU base addresses for each DPU core and thus 32 groups of DPU base addresses for up to four DPU cores. The details of reg_dpu_base_addr are shown in the following table.
Register | Address Offset | Width | Type | Description |
---|---|---|---|---|
reg_dpu0_base_addr0_l | 0x224 | 32 | R/W | The lower 32 bits of base_address0 of DPU core0. |
reg_dpu0_base_addr0_h | 0x228 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address0 of DPU core0. |
reg_dpu0_base_addr1_l | 0x22C | 32 | R/W | The lower 32 bits of base_address1 of DPU core0. |
reg_dpu0_base_addr1_h | 0x230 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address1 of DPU core0. |
reg_dpu0_base_addr2_l | 0x234 | 32 | R/W | The lower 32 bits of base_address2 of DPU core0. |
reg_dpu0_base_addr2_h | 0x238 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address2 of DPU core0. |
reg_dpu0_base_addr3_l | 0x23C | 32 | R/W | The lower 32 bits of base_address3 of DPU core0. |
reg_dpu0_base_addr3_h | 0x240 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address3 of DPU core0. |
reg_dpu0_base_addr4_l | 0x244 | 32 | R/W | The lower 32 bits of base_address4 of DPU core0. |
reg_dpu0_base_addr4_h | 0x248 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address4 of DPU core0. |
reg_dpu0_base_addr5_l | 0x24C | 32 | R/W | The lower 32 bits of base_address5 of DPU core0. |
reg_dpu0_base_addr5_h | 0x250 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address5 of DPU core0. |
reg_dpu0_base_addr6_l | 0x254 | 32 | R/W | The lower 32 bits of base_address6 of DPU core0. |
reg_dpu0_base_addr6_h | 0x258 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address6 of DPU core0. |
reg_dpu0_base_addr7_l | 0x25C | 32 | R/W | The lower 32 bits of base_address7 of DPU core0. |
reg_dpu0_base_addr7_h | 0x260 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address7 of DPU core0. |
reg_dpu1_base_addr0_l | 0x324 | 32 | R/W | The lower 32 bits of base_address0 of DPU core1. |
reg_dpu1_base_addr0_h | 0x328 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address0 of DPU core1. |
reg_dpu1_base_addr1_l | 0x32C | 32 | R/W | The lower 32 bits of base_address1 of DPU core1. |
reg_dpu1_base_addr1_h | 0x330 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address1 of DPU core1. |
reg_dpu1_base_addr2_l | 0x334 | 32 | R/W | The lower 32 bits of base_address2 of DPU core1. |
reg_dpu1_base_addr2_h | 0x338 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address2 of DPU core1. |
reg_dpu1_base_addr3_l | 0x33C | 32 | R/W | The lower 32 bits of base_address3 of DPU core1. |
reg_dpu1_base_addr3_h | 0x340 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address3 of DPU core1. |
reg_dpu1_base_addr4_l | 0x344 | 32 | R/W | The lower 32 bits of base_address4 of DPU core1. |
reg_dpu1_base_addr4_h | 0x348 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address4 of DPU core1. |
reg_dpu1_base_addr5_l | 0x34C | 32 | R/W | The lower 32 bits of base_address5 of DPU core1. |
reg_dpu1_base_addr5_h | 0x350 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address5 of DPU core1. |
reg_dpu1_base_addr6_l | 0x354 | 32 | R/W | The lower 32 bits of base_address6 of DPU core1. |
reg_dpu1_base_addr6_h | 0x358 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address6 of DPU core1. |
reg_dpu1_base_addr7_l | 0x35C | 32 | R/W | The lower 32 bits of base_address7 of DPU core1. |
reg_dpu1_base_addr7_h | 0x360 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address7 of DPU core1. |
reg_dpu2_base_addr1_l | 0x42C | 32 | R/W | The lower 32 bits of base_address1 of DPU core2. |
reg_dpu2_base_addr1_h | 0x430 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address1 of DPU core2. |
reg_dpu2_base_addr2_l | 0x434 | 32 | R/W | The lower 32 bits of base_address2 of DPU core2. |
reg_dpu2_base_addr2_h | 0x438 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address2 of DPU core2. |
reg_dpu2_base_addr3_l | 0x43C | 32 | R/W | The lower 32 bits of base_address3 of DPU core2. |
reg_dpu2_base_addr3_h | 0x440 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address3 of DPU core2. |
reg_dpu2_base_addr4_l | 0x444 | 32 | R/W | The lower 32 bits of base_address4 of DPU core2. |
reg_dpu2_base_addr4_h | 0x448 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address4 of DPU core2. |
reg_dpu2_base_addr5_l | 0x44C | 32 | R/W | The lower 32 bits of base_address5 of DPU core2. |
reg_dpu2_base_addr5_h | 0x450 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address5 of DPU core2. |
reg_dpu2_base_addr6_l | 0x454 | 32 | R/W | The lower 32 bits of base_address6 of DPU core2. |
reg_dpu2_base_addr6_h | 0x458 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address6 of DPU core2. |
reg_dpu2_base_addr7_l | 0x45C | 32 | R/W | The lower 32 bits of base_address7 of DPU core2. |
reg_dpu2_base_addr7_h | 0x460 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address7 of DPU core2. |
reg_dpu3_base_addr0_l | 0x524 | 32 | R/W | The lower 32 bits of base_address0 of DPU core3. |
reg_dpu3_base_addr0_h | 0x528 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address0 of DPU core3. |
reg_dpu3_base_addr1_l | 0x52C | 32 | R/W | The lower 32 bits of base_address1 of DPU core3. |
reg_dpu3_base_addr1_h | 0x530 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address1 of DPU core3. |
reg_dpu3_base_addr2_l | 0x534 | 32 | R/W | The lower 32 bits of base_address2 of DPU core3. |
reg_dpu3_base_addr2_h | 0x538 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address2 of DPU core3. |
reg_dpu3_base_addr3_l | 0x53C | 32 | R/W | The lower 32 bits of base_address3 of DPU core3. |
reg_dpu3_base_addr3_h | 0x540 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address3 of DPU core3. |
reg_dpu3_base_addr4_l | 0x544 | 32 | R/W | The lower 32 bits of base_address4 of DPU core3. |
reg_dpu3_base_addr4_h | 0x548 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address4 of DPU core3. |
reg_dpu3_base_addr5_l | 0x54C | 32 | R/W | The lower 32 bits of base_address5 of DPU core3. |
reg_dpu3_base_addr5_h | 0x550 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address5 of DPU core3. |
reg_dpu3_base_addr6_l | 0x554 | 32 | R/W | The lower 32 bits of base_address6 of DPU core3 |
reg_dpu3_base_addr6_h | 0x558 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address6 of DPU core3. |
reg_dpu3_base_addr7_l | 0x55C | 32 | R/W | The lower 32 bits of base_address7 of DPU core3. |
reg_dpu3_base_addr7_h | 0x560 | 32 | R/W | The lower 8 bits in the register represent the upper 8 bits of base_address7 of DPU core3. |
Interrupts
The DPU generates an interrupt to
signal the completion of a task. A high state on reg_dpu0_start signals the start of a
DPU task for DPU core0. At the end of the task, the
DPU generates an interrupt and bit0
in reg_dpu_isr is set to 1. The position of the active bit in the reg_dpu_isr depends on
the number of DPU cores. For example,
when DPU core1 finishes a task while
DPU core0 is still working,
reg_dpu_isr would maintain 2’b10
.
The width of the dpu_interrupt signal is determined by the number of DPU cores. When the parameter DPU_NUM is set to 2, then the DPU IP contains two DPU cores, and the width of the dpu_interrupt signal is two. The lower bit represents the DPU core0 interrupt and the higher bit represents the DPU core1 interrupt.
The interrupt connection between the DPU and the PS is described in the device tree file, which indicates the interrupt number of the DPU connected to the PS. Any interrupt pin may be used if the device tree file and Vivado assignments match. The reference connection is shown here.
- If the softmax option is enabled, then the softmax interrupt should be correctly connected to the PS according to the device tree description.
- irq7~irq0 corresponds to pl_ps_irq0[7:0].
- irq15~irq8 corresponds to pl_ps_irq1[7:0].