xcl_pipeline_workitems
Description
Pipeline a work item to improve latency and throughput. Work item pipelining is the extension of loop pipelining to the kernel work group. This is necessary for maximizing kernel throughput and performance.
Syntax
Place the attribute in the OpenCL source before the elements to pipeline:
__attribute__((xcl_pipeline_workitems))
Example 1
In order to handle the reqd_work_group_size
attribute in the following
example, SDAccel automatically inserts a loop nest to handle the three-dimensional
characteristics of the ND range (3,1,1). As a result of the added loop nest, the execution
profile of this kernel is like an unpipelined loop. Adding the
xcl_pipeline_workitems
attribute adds concurrency and improves the
throughput of the code.
kernel
__attribute__ ((reqd_work_group_size(3,1,1)))
void foo(...)
{
...
__attribute__((xcl_pipeline_workitems)) {
int tid = get_global_id(0);
op_Read(tid);
op_Compute(tid);
op_Write(tid);
}
...
}
Example 2
The following example adds the work-item pipeline to the appropriate elements of the kernel:
__kernel __attribute__ ((reqd_work_group_size(8, 8, 1)))
void madd(__global int* a, __global int* b, __global int* output)
{
int rank = get_local_size(0);
__local unsigned int bufa[64];
__local unsigned int bufb[64];
__attribute__((xcl_pipeline_workitems)) {
int x = get_local_id(0);
int y = get_local_id(1);
bufa[x*rank + y] = a[x*rank + y];
bufb[x*rank + y] = b[x*rank + y];
}
barrier(CLK_LOCAL_MEM_FENCE);
__attribute__((xcl_pipeline_workitems)) {
int index = get_local_id(1)*rank + get_local_id(0);
output[index] = bufa[index] + bufb[index];
}
}
See Also
- pragma HLS pipeline
- SDAccel Environment Optimization Guide (UG1207)
- Vivado Design Suite User Guide: High-Level Synthesis (UG902)