Below is the code snippet from the Wide Memory Read/Write Example on Xilinx On-boarding Example GitHub that shows
the recommended coding style for automatically inferring burst read for one dimensional
vectors. A local memory v1_local is used for buffering the data from a
single burst. The entire input vector is read in multiple bursts. The choice of
LOCAL_MEM_SIZE depends on the specific application and available
on-chip memory on the target FPGA.
kernel __attribute__ ((reqd_work_group_size(1, 1, 1)))
void vadd(
const __global uint16 *in1, // Read-Only Vector 1
const __global uint16 *in2, // Read-Only Vector 2
__global uint16 *out, // Output Result
int size // Size in integer
)
{
local uint16 v1_local[LOCAL_MEM_SIZE]; // Local memory to store vector1
int size_in16 = (size-1) / VECTOR_SIZE + 1;
...
for(int i = 0; i < size_in16; i += LOCAL_MEM_SIZE)
{
...
int chunk_size = LOCAL_MEM_SIZE;
//boundary checks
if ((i + LOCAL_MEM_SIZE) > size_in16)
chunk_size = size_in16 - i;
v1_rd: __attribute__((xcl_pipeline_loop))
for (int j = 0 ; j < chunk_size; j++){
v1_local[j] = in1 [i + j];
}
...
}
}
The Device Hardware Transaction View below shows that multiple read bursts are
sent at the kernel start and all read data come back continuously after the memory read
latency.