xcl_max_work_group_size
Description
Use this attribute instead of reqd_work_group_size
when you need to
specify a larger kernel than the 4K size.
reqd_work_group_size
attribute. SDx supports work size larger than 4096
with the Xilinx attribute xcl_max_work_group_size
. Syntax
__attribute__((xcl_max_work_group_size(X, Y, Z)))
- X, Y, Z: Specifies the ND range of the kernel. This represents each dimension of a three dimensional matrix specifying the size of the work-group for the kernel.
Example 1
Below is the kernel source code for an un-optimized adder. No attributes were specified for this design other than the work size equal to the size of the matrices (i.e., 64x64). That is, iterating over an entire workgroup will fully add the input matrices a and b and output the result to output. All three are global integer pointers, which means each value in the matrices is four bytes and is stored in off-chip DDR global memory.
#define RANK 64
__kernel __attribute__ ((reqd_work_group_size(RANK, RANK, 1)))
void madd(__global int* a, __global int* b, __global int* output) {
int index = get_local_id(1)*get_local_size(0) + get_local_id(0);
output[index] = a[index] + b[index];
}
Any matrix larger than 64x64 would need to only use one dimension to define the work size. That is, a 128x128 matrix could be operated on by a kernel with a work size of (128, 1, 1), where each invocation operates on an entire row or column of data.
See Also
- SDAccel Environment Optimization Guide (UG1207)
- https://www.khronos.org/
- The OpenCL C Specification