OpenCL Memory Model
Figure: OpenCL Memory Model
The OpenCL memory model defines the behavior and hierarchy of memory that can be used by OpenCL applications. This hierarchical representation of memory is common across all OpenCL implementations, but it is up to individual vendors to define how the OpenCL memory model maps to specific hardware. This section defines the mapping used by SDAccel.
Host Memory
The host memory is defined as the region of system memory that is directly (and only) accessible from the host processor. Any data needed by compute kernels must be transferred to and from OpenCL device global memory using the OpenCL API.
Global Memory
The global memory is defined as the region of device memory that is accessible to both the OpenCL host and device. Global memory permits read/write access to the host processor as well to all compute units in the device. The host is responsible for the allocation and de-allocation of buffers in this memory space. There is a handshake between host and device over control of the data stored in this memory. The host processor transfers data from the host memory space into the global memory space. Then, once a kernel is launched to process the data, the host loses access rights to the buffer in global memory. The device takes over and is capable of reading and writing from the global memory until the kernel execution is complete. Upon completion of the operations associated with a kernel, the device turns control of the global memory buffer back to the host processor. Once it has regained control of a buffer, the host processor can read and write data to the buffer, transfer data back to the host memory, and de-allocate the buffer.
Constant Global Memory
Constant global memory is defined as the region of system memory that is accessible with read and write access for the OpenCL host and with read only access for the OpenCL device. As the name implies, the typical use for this memory is to transfer constant data needed by kernel computation from the host to the device.
Local Memory
Local memory is a region of memory that is local to a single compute unit. The host processor has no visibility and no control on the operations that occur in this memory space. This memory space allows read and write operations by all the processing elements with a compute units. This level of memory is typically used to store data that must be shared by multiple work-items. Operations on local memory are un-ordered between work-items but synchronization and consistency can be achieved using barrier and fence operations. In SDAccel, the structure of local memory can be customized to meet the requirements of an algorithm or application.
Private Memory
Private memory is the region of memory that is private to an individual work-item executing within an OpenCL processing element. As with local memory, the host processor has no visiblilty into this memory region. This memory space can be read from and written to by all work-items, but variables defined in one work-item's private memory are not visible to another work-item. In SDAccel, the structure of private memory can be customized to meet the requirements of an algorithm or application.
For devices using an FPGA device, the physical mapping of the OpenCL memory model is the following:
- Host memory is any memory connected to the host processor only.
- Global and constant memories are any memory that is connected to the FPGA device. These are usually memory chips (e.g. SDRAM) that are physically connected to the FPGA device, but might also include distributed memories (e.g. BlockRAM) within the FPGA fabric. The host processor has access to these memory banks through infrastructure provided by the FPGA platform.
- Local memory is memory inside of the FPGA device. This memory is typically implemented using registers or BlockRAMs in the FPGA fabric.
- Private memory is memory inside of the FPGA device. This memory is typically implemented using registers or BlockRAMs in the FPGA fabric.