reqd_work_group_size
Description
When OpenCL™ kernels are submitted for execution on an OpenCL device, they execute within an index space, called an ND range, which can have 1, 2, or 3 dimensions. This is called the global size in the OpenCL API. The work-group size defines the amount of the ND range that can be processed by a single invocation of a kernel compute unit. The work-group size is also called the local size in the OpenCL API. The OpenCL compiler can determine the work-group size based on the properties of the kernel and selected device. Once the work-group size (local size) has been determined, the ND range (global size) is divided automatically into work-groups, and the work-groups are scheduled for execution on the device.
Although the OpenCL compiler can define the work-group size, the
specification of the reqd_work_group_size
attribute on the
kernel to define the work-group size is highly recommended for FPGA implementations of the
kernel. The attribute is recommended for performance optimization during the generation of
the custom logic for a kernel. See "OpenCL Execution Model" in the
SDAccel Environment Profiling and Optimization
Guide (UG1207) for more information.
reqd_work_group_size
attribute is highly recommended as it can be used
for performance optimization during the generation of the custom logic for a kernel.OpenCL kernel functions are executed exactly one time for each point in the ND range index
space. This unit of work for each point in the ND range is called a work-item. Work-items
are organized into work-groups, which are the unit of work scheduled onto compute units. The
optional reqd_work_group_size
defines the work-group size of a compute unit
that must be used as the local_work_size
argument to
clEnqueueNDRangeKernel
. This allows the compiler to optimize the
generated code appropriately for this kernel.
Syntax
__attribute__((reqd_work_group_size(X, Y, Z)))
- X, Y, Z: Specifies the ND range of the kernel. This represents each dimension of a three dimensional matrix specifying the size of the work-group for the kernel.
Examples
The following OpenCL API C kernel code shows a vector addition design where two arrays of data are summed into a third array. The required size of the work-group is 16x1x1. This kernel will execute 16 times to produce a valid result.
#include <clc.h>
// For VHLS OpenCL C kernels, the full work group is synthesized
__attribute__ ((reqd_work_group_size(16, 1, 1)))
__kernel void
vadd(__global int* a,
__global int* b,
__global int* c)
{
int idx = get_global_id(0);
c[idx] = a[idx] + b[idx];
}
See Also
- SDAccel Environment Profiling and Optimization Guide (UG1207)
- https://www.khronos.org/
- The OpenCL C Specification