DATAFLOW expresses parallelism at a coarse-grain level. It allows the SDAccel
compiler to schedule multiple sequential loops and functions concurrently to achieve higher
throughput and lower latency. The figure below shows a conceptual view of dataflow pipelining.
After synthesis, the default behavior is to execute and complete func_A, then func_B, and finally func_C. With DATAFLOW enabled, the SDAccel compiler can schedule
each function to execute as soon as data is available. In this example, the original function
has a latency and interval of 8 clock cycles. With DATAFLOW optimization, the interval is
reduced to only three clock cycles. The tasks shown in this example are functions, but
dataflow optimization can be applied to any combinations of functions and loops.