Using Multiple Compute Units
Depending on available resources on the target device, multiple compute units of the same kernel or different kernels can be created and run in parallel to improve the system processing time and throughput.
An application can use multiple compute units in the target device by createing multiple in-order command queues or a single out-of-order command queue.