Enabling Concurrent Processing with DATAFLOW

DATAFLOW expresses parallelism at a coarse-grain level. It allows the SDAccel compiler to schedule multiple sequential loops and functions concurrently to achieve higher throughput and lower latency. The figure below shows a conceptual view of dataflow pipelining. After synthesis, the default behavior is to execute and complete func_A, then func_B, and finally func_C. With DATAFLOW enabled, the SDAccel compiler can schedule each function to execute as soon as data is available. In this example, the original function has a latency and interval of 8 clock cycles. With DATAFLOW optimization, the interval is reduced to only three clock cycles. The tasks shown in this example are functions, but dataflow optimization can be applied to any combinations of functions and loops.