Enabling DATAFLOW on OpenCL C Kernels

DATAFLOW is supported in OpenCL™ C kernels with a Xilinx vendor extension to the OpenCL specification. The attribute xcl_dataflow can be added to a kernel to enable concurrent scheduling of sub-functions and loops within the kernel function.

Below is a functon dataflow example in dataflow category on Xilinx On-boarding Example GitHub. The top level kernel adder consists of three sub-fucntions with xcl_dataflow attribute applied to the kernel definition.

__kernel 
__attribute__ ((reqd_work_group_size(1, 1, 1)))
__attribute__ ((xcl_dataflow))
void adder(__global int *in, __global int *out, int inc, int size)
{
    int buffer_in[BUFFER_SIZE];
    int buffer_out[BUFFER_SIZE];

    read_input(in,buffer_in,size);
    compute_add(buffer_in,buffer_out,inc,size);
    write_result(out,buffer_out,size);
}

When using xcl_dataflow, Xilinx recommends that you use the reqd_work_group_size(1, 1, 1) attribute. This enables the logic in the kernel to execute for the longest contiguous amount of time, and provides the greatest opportunity for parallelism. This method employs more hardware units operating concurrenly for the longest possible time.

Note: Do not use the vec_type_hint attribute with work_group_size of (1, 1, 1).

Currently there are some limitations on xcl_dataflow usage in the current version of the SDAccel™ compiler:

  • OpenCL workitem built-in functions such as get_global_size(), get_local_id can not be used in the kernel with the xcl_dataflow attribute.
  • SDAccel compiler uses FIFOs for data channels between processes by default when DATAFLOW is enabled. The default FIFO depth is the same as the array size. The --xp "param:compiler.xclDataflowFifoDepth=depth_value" option can be passed to xocc command line to change the default FIFO depth.