Enabling DATAFLOW on OpenCL C Kernels
DATAFLOW is supported in OpenCL™ C kernels with a Xilinx
vendor extension to the OpenCL specification. The attribute xcl_dataflow
can be added to a kernel to enable concurrent scheduling of
sub-functions and loops within the kernel function.
Below is a functon dataflow example in dataflow category on Xilinx On-boarding Example GitHub. The top level kernel adder
consists of three sub-fucntions with xcl_dataflow
attribute applied to the kernel definition.
__kernel
__attribute__ ((reqd_work_group_size(1, 1, 1)))
__attribute__ ((xcl_dataflow))
void adder(__global int *in, __global int *out, int inc, int size)
{
int buffer_in[BUFFER_SIZE];
int buffer_out[BUFFER_SIZE];
read_input(in,buffer_in,size);
compute_add(buffer_in,buffer_out,inc,size);
write_result(out,buffer_out,size);
}
When using xcl_dataflow
, Xilinx recommends
that you use the reqd_work_group_size(1, 1, 1)
attribute.
This enables the logic in the kernel to execute for the longest contiguous amount of time, and
provides the greatest opportunity for parallelism. This method employs more hardware units
operating concurrenly for the longest possible time.
Currently there are some limitations on xcl_dataflow
usage in the current version of the SDAccel™
compiler:
- OpenCL workitem built-in functions such as
get_global_size()
,get_local_id
can not be used in the kernel with thexcl_dataflow
attribute. - SDAccel compiler uses FIFOs for data channels
between processes by default when DATAFLOW is enabled. The default FIFO depth is the same
as the array size. The
--xp "param:compiler.xclDataflowFifoDepth=depth_value"
option can be passed to xocc command line to change the default FIFO depth.