Getting Started with SDAccel
This chapter provides details on using xfOpenCV in the SDAccel™ environment. The following sections would provide a description of the methodology to create a kernel, corresponding host code and a suitable makefile to compile an xfOpenCV kernel for any of the supported platforms in SDAccel. The subsequent section also explains the methodology to verify the kernel in various emulation modes and on the hardware.
Prerequisites
- Valid installation of SDx™ 2019.1 or later version and the corresponding licenses.
- Install the xfOpenCV libraries, if you intend to use libraries compiled differently than what is provided in SDx.
- Install the card for which the platform is supported in SDx 2019.1 or later versions.
- Xilinx® Runtime (XRT) must be installed. XRT provides software interface to Xilinx FPGAs.
- libOpenCL.so must be installed if not present along with the platform.
SDAccel Design Methodology
- Host code with OpenCL constructs
- Wrappers around HLS Kernel(s)
- Makefile to compile the kernel for emulation or running on hardware.
Host Code with OpenCL
- Loading the kernel binary on the FPGA – xcl::import_binary_file() loads the bitstream and programs the FPGA to enable required processing of data.
- Setting up memory buffers for data transfer – Data needs to be sent and read from the DDR memory on the hardware. cl::Buffers are created to allocate required memory for transferring data to and from the hardware.
- Transfer data to and from the hardware –enqueueWriteBuffer() and enqueueReadBuffer() are used to transfer the data to and from the hardware at the required time.
- Execute kernel on the FPGA – There are functions to execute kernels on the FPGA. There can be single kernel execution or multiple kernel execution that could be asynchronous or synchronous with each other. Commonly used command is enqueueTask().
- Profiling the performance of kernel execution – The host code in OpenCL also enables measurement of the execution time of a kernel on the FPGA. The function used in our examples for profiling is getProfilingInfo().
Wrappers around HLS Kernel(s)
All xfOpenCV kernels are provided with C++ function templates (located at <Github repo>/include) with image containers as objects of xf::Mat class. In addition, these kernels will work either in stream based (where complete image is read continuously) or memory mapped (where image data access is in blocks).
SDAccel flow (OpenCL) requires kernel interfaces to be memory pointers with width in power(s) of 2. So glue logic is required for converting memory pointers to xf::Mat class data type and vice-versa when interacting with xfOpenCV kernel(s). Wrapper(s) are build over the kernel(s) with this glue logic. Below examples will provide a methodology to handle different kernel (xfOpenCV kernels located at <Github repo>/include) types (stream and memory mapped).
Stream Based Kernels
To facilitate the conversion of pointer to xf::Mat and vice versa, two adapter functions are included as part of xfOpenCV xf::Array2xfMat() and xf::xfMat2Array().It is necessary for the xf::Mat objects to be invoked as streams using HLS pragma with a minimum depth of 2. This results in a top-level (or wrapper) function for the kernel as shown below:
extern “C”
{
void func_top (ap_uint *gmem_in, ap_uint *gmem_out, ...) {
xf::Mat<…> in_mat(…), out_mat(…);
#pragma HLS stream variable=in_mat.data depth=2
#pragma HLS stream variable=out_mat.data depth=2
#pragma HLS dataflow
xf::Array2xfMat<…> (gmem_in, in_mat);
xf::xfopencv-func<…> (in_mat, out_mat…);
xf::xfMat2Array<…> (gmem_out, out_mat);
}
}
The above illustration assumes that the data in xf::Mat is being streamed in and streamed out. You can also create a pipeline with multiple functions in pipeline instead of just one xfopencv function.
For the stream based kernels with different inputs of different sizes, multiple instances of the adapter functions are necessary. For this,
extern “C” {
void func_top (ap_uint *gmem_in1, ap_uint *gmem_in2, ap_uint *gmem_in3, ap_uint *gmem_out, ...) {
xf::Mat<...,HEIGHT,WIDTH,…> in_mat1(…), out_mat(…);
xf::Mat<...,HEIGHT/4,WIDTH,…> in_mat2(…), in_mat3(…);
#pragma HLS stream variable=in_mat1.data depth=2
#pragma HLS stream variable=in_mat2.data depth=2
#pragma HLS stream variable=in_mat3.data depth=2
#pragma HLS stream variable=out_mat.data depth=2
#pragma HLS dataflow
xf::accel_utils obj_a, obj_b;
obj_a.Array2xfMat<…,HEIGHT,WIDTH,…> (gmem_in1, in_mat1);
obj_b.Array2xfMat<…,HEIGHT/4,WIDTH,…> (gmem_in2, in_mat2);
obj_b.Array2xfMat<…,HEIGHT/4,WIDTH,…> (gmem_in3, in_mat3);
xf::xfopencv-func(in_mat1, in_mat2, int_mat3, out_mat…);
xf::xfMat2Array<…> (gmem_out, out_mat);
}
}
For the stream based implementations, the data must be fetched from the input AXI and must be pushed to xfMat as required by the xfcv kernels for that particular configuration. Likewise, the same operations must be performed for the output of the xfcv kernel. To perform this, two utility functions are provided, xf::Array2xfMat() and xf::xfMat2Array().
Stream Based Kernels
To facilitate the conversion of pointer to xf::Mat and vice versa, two adapter functions are included as part of xfOpenCV xf::Array2xfMat() and xf::xfMat2Array().It is necessary for the xf::Mat objects to be invoked as streams using HLS pragma with a minimum depth of 2. This results in a top-level (or wrapper) function for the kernel as shown below:
extern “C”
{
void func_top (ap_uint *gmem_in, ap_uint *gmem_out, ...) {
xf::Mat<…> in_mat(…), out_mat(…);
#pragma HLS stream variable=in_mat.data depth=2
#pragma HLS stream variable=out_mat.data depth=2
#pragma HLS dataflow
xf::Array2xfMat<…> (gmem_in, in_mat);
xf::xfopencv-func<…> (in_mat, out_mat…);
xf::xfMat2Array<…> (gmem_out, out_mat);
}
}
The above illustration assumes that the data in xf::Mat is being streamed in and streamed out. You can also create a pipeline with multiple functions in pipeline instead of just one xfopencv function.
For the stream based kernels with different inputs of different sizes, multiple instances of the adapter functions are necessary. For this,
extern “C” {
void func_top (ap_uint *gmem_in1, ap_uint *gmem_in2, ap_uint *gmem_in3, ap_uint *gmem_out, ...) {
xf::Mat<...,HEIGHT,WIDTH,…> in_mat1(…), out_mat(…);
xf::Mat<...,HEIGHT/4,WIDTH,…> in_mat2(…), in_mat3(…);
#pragma HLS stream variable=in_mat1.data depth=2
#pragma HLS stream variable=in_mat2.data depth=2
#pragma HLS stream variable=in_mat3.data depth=2
#pragma HLS stream variable=out_mat.data depth=2
#pragma HLS dataflow
xf::accel_utils obj_a, obj_b;
obj_a.Array2xfMat<…,HEIGHT,WIDTH,…> (gmem_in1, in_mat1);
obj_b.Array2xfMat<…,HEIGHT/4,WIDTH,…> (gmem_in2, in_mat2);
obj_b.Array2xfMat<…,HEIGHT/4,WIDTH,…> (gmem_in3, in_mat3);
xf::xfopencv-func(in_mat1, in_mat2, int_mat3, out_mat…);
xf::xfMat2Array<…> (gmem_out, out_mat);
}
}
For the stream based implementations, the data must be fetched from the input AXI and must be pushed to xfMat as required by the xfcv kernels for that particular configuration. Likewise, the same operations must be performed for the output of the xfcv kernel. To perform this, two utility functions are provided, xf::Array2xfMat() and xf::xfMat2Array().
xfMat2Array
This function converts the input xf::Mat to output array. The output of the xf::kernel function will be xf::Mat, and it will require to convert that to output pointer.
template <int PTR_WIDTH, int MAT_T, int ROWS, int COLS, int NPC>
void xfMat2Array(xf::Mat<MAT_T,ROWS,COLS,NPC>& srcMat, ap_uint< PTR_WIDTH > *dstPtr)
Parameter | Description |
---|---|
PTR_WIDTH | Data width of the output pointer. The value must be power 2, from 8 to 512. |
MAT_T | Input Mat type. Example XF_8UC1, XF_16UC1, XF_8UC3 and XF_8UC4 |
ROWS | Maximum height of image |
COLS | Maximum width of image |
NPC | Number of pixels computed in parallel. Example XF_NPPC1, XF_NPPC8 |
dstPtr | Output pointer. Type of the pointer based on the PTR_WIDTH. |
srcMat | Input image of type xf::Mat |
Interface pointer widths
MAT type | Parallelism | Min PTR_WIDTH | Max PTR_WIDTH |
---|---|---|---|
XF_8UC1 | XF_NPPC1 | 8 | 512 |
XF_16UC1 | XF_NPPC1 | 16 | 512 |
XF_ 8UC1 | XF_NPPC8 | 64 | 512 |
XF_ 16UC1 | XF_NPPC8 | 128 | 512 |
XF_ 8UC3 | XF_NPPC1 | 32 | 512 |
XF_ 8UC3 | XF_NPPC8 | 256 | 512 |
XF_8UC4 | XF_NPPC8 | 256 | 512 |
XF_8UC3 | XF_NPPC16 | 512 | 512 |
Stream Based Kernels
To facilitate the conversion of pointer to xf::Mat and vice versa, two adapter functions are included as part of xfOpenCV xf::Array2xfMat() and xf::xfMat2Array().It is necessary for the xf::Mat objects to be invoked as streams using HLS pragma with a minimum depth of 2. This results in a top-level (or wrapper) function for the kernel as shown below:
extern “C”
{
void func_top (ap_uint *gmem_in, ap_uint *gmem_out, ...) {
xf::Mat<…> in_mat(…), out_mat(…);
#pragma HLS stream variable=in_mat.data depth=2
#pragma HLS stream variable=out_mat.data depth=2
#pragma HLS dataflow
xf::Array2xfMat<…> (gmem_in, in_mat);
xf::xfopencv-func<…> (in_mat, out_mat…);
xf::xfMat2Array<…> (gmem_out, out_mat);
}
}
The above illustration assumes that the data in xf::Mat is being streamed in and streamed out. You can also create a pipeline with multiple functions in pipeline instead of just one xfopencv function.
For the stream based kernels with different inputs of different sizes, multiple instances of the adapter functions are necessary. For this,
extern “C” {
void func_top (ap_uint *gmem_in1, ap_uint *gmem_in2, ap_uint *gmem_in3, ap_uint *gmem_out, ...) {
xf::Mat<...,HEIGHT,WIDTH,…> in_mat1(…), out_mat(…);
xf::Mat<...,HEIGHT/4,WIDTH,…> in_mat2(…), in_mat3(…);
#pragma HLS stream variable=in_mat1.data depth=2
#pragma HLS stream variable=in_mat2.data depth=2
#pragma HLS stream variable=in_mat3.data depth=2
#pragma HLS stream variable=out_mat.data depth=2
#pragma HLS dataflow
xf::accel_utils obj_a, obj_b;
obj_a.Array2xfMat<…,HEIGHT,WIDTH,…> (gmem_in1, in_mat1);
obj_b.Array2xfMat<…,HEIGHT/4,WIDTH,…> (gmem_in2, in_mat2);
obj_b.Array2xfMat<…,HEIGHT/4,WIDTH,…> (gmem_in3, in_mat3);
xf::xfopencv-func(in_mat1, in_mat2, int_mat3, out_mat…);
xf::xfMat2Array<…> (gmem_out, out_mat);
}
}
For the stream based implementations, the data must be fetched from the input AXI and must be pushed to xfMat as required by the xfcv kernels for that particular configuration. Likewise, the same operations must be performed for the output of the xfcv kernel. To perform this, two utility functions are provided, xf::Array2xfMat() and xf::xfMat2Array().
Design example Using Library on SDAccel
Following is a multi-kernel example, where different kernel runs sequentially in a pipeline to form an application. This example performs Canny edge detection, where two kernels are involved, Canny and edge tracing. Canny function will take gray-scale image as input and provided the edge information in 3 states(weak edge(1),strong edge(3) and background(0)), which is being fed into edge tracing, which filters out the weak edges. The prior works in a streaming based implementation and the later in a memory mapped manner.
Host code
// setting up device and platform
std::vector<cl::Device> devices = xcl::get_xil_devices();
cl::Device device = devices[0];
cl::Context context(device);
cl::CommandQueue q(context, device,CL_QUEUE_PROFILING_ENABLE);
std::string device_name = device.getInfo<CL_DEVICE_NAME>();
// Kernel 1: Canny
std::string binaryFile=xcl::find_binary_file(device_name,"krnl_canny");
cl::Program::Binaries bins = xcl::import_binary_file(binaryFile);
devices.resize(1);
cl::Program program(context, devices, bins);
cl::Kernel krnl(program,"canny_accel");
// creating necessary cl buffers for input and output
cl::Buffer imageToDevice(context, CL_MEM_READ_ONLY,(height*width));
cl::Buffer imageFromDevice(context, CL_MEM_WRITE_ONLY,(height*width/4));
// Set the kernel arguments
krnl.setArg(0, imageToDevice);
krnl.setArg(1, imageFromDevice);
krnl.setArg(2, height);
krnl.setArg(3, width);
krnl.setArg(4, low_threshold);
krnl.setArg(5, high_threshold);
// write the input image data from host to device memory
q.enqueueWriteBuffer(imageToDevice, CL_TRUE, 0,(height*(width)),img_gray.data);
// Profiling Objects
cl_ulong start= 0;
cl_ulong end = 0;
double diff_prof = 0.0f;
cl::Event event_sp;
// Launch the kernel
q.enqueueTask(krnl,NULL,&event_sp);
clWaitForEvents(1, (const cl_event*) &event_sp);
// profiling
event_sp.getProfilingInfo(CL_PROFILING_COMMAND_START,&start);
event_sp.getProfilingInfo(CL_PROFILING_COMMAND_END,&end);
diff_prof = end-start;
std::cout<<(diff_prof/1000000)<<"ms"<<std::endl;
// Kernel 2: edge tracing
cl::Kernel krnl2(program,"edgetracing_accel");
cl::Buffer imageFromDeviceedge(context, CL_MEM_WRITE_ONLY,(height*width));
// Set the kernel arguments
krnl2.setArg(0, imageFromDevice);
krnl2.setArg(1, imageFromDeviceedge);
krnl2.setArg(2, height);
krnl2.setArg(3, width);
// Profiling Objects
cl_ulong startedge= 0;
cl_ulong endedge = 0;
double diff_prof_edge = 0.0f;
cl::Event event_sp_edge;
// Launch the kernel
q.enqueueTask(krnl2,NULL,&event_sp_edge);
clWaitForEvents(1, (const cl_event*) &event_sp_edge);
// profiling
event_sp_edge.getProfilingInfo(CL_PROFILING_COMMAND_START,&startedge);
event_sp_edge.getProfilingInfo(CL_PROFILING_COMMAND_END,&endedge);
diff_prof_edge = endedge-startedge;
std::cout<<(diff_prof_edge/1000000)<<"ms"<<std::endl;
//Copying Device result data to Host memory
q.enqueueReadBuffer(imageFromDeviceedge, CL_TRUE, 0,(height*width),out_img_edge.data);
q.finish();
Top level kernelBelow is the top-level/wrapper function with all necessary glue logic.
// streaming based kernel
#include "xf_canny_config.h"
extern "C" {
void canny_accel(ap_uint<INPUT_PTR_WIDTH> *img_inp, ap_uint<OUTPUT_PTR_WIDTH> *img_out, int rows, int cols,int low_threshold,int high_threshold)
{
#pragma HLS INTERFACE m_axi port=img_inp offset=slave bundle=gmem1
#pragma HLS INTERFACE m_axi port=img_out offset=slave bundle=gmem2
#pragma HLS INTERFACE s_axilite port=img_inp bundle=control
#pragma HLS INTERFACE s_axilite port=img_out bundle=control
#pragma HLS INTERFACE s_axilite port=rows bundle=control
#pragma HLS INTERFACE s_axilite port=cols bundle=control
#pragma HLS INTERFACE s_axilite port=low_threshold bundle=control
#pragma HLS INTERFACE s_axilite port=high_threshold bundle=control
#pragma HLS INTERFACE s_axilite port=return bundle=control
xf::Mat<XF_8UC1, HEIGHT, WIDTH, INTYPE> in_mat(rows,cols);
#pragma HLS stream variable=in_mat.data depth=2
xf::Mat<XF_2UC1, HEIGHT, WIDTH, XF_NPPC32> dst_mat(rows,cols);
#pragma HLS stream variable=dst_mat.data depth=2
#pragma HLS DATAFLOW
xf::Array2xfMat<INPUT_PTR_WIDTH,XF_8UC1,HEIGHT,WIDTH,INTYPE>(img_inp,in_mat);
xf::Canny<FILTER_WIDTH,NORM_TYPE,XF_8UC1,XF_2UC1,HEIGHT, WIDTH,INTYPE,XF_NPPC32,XF_USE_URAM>(in_mat,dst_mat,low_threshold,high_threshold);
xf::xfMat2Array<OUTPUT_PTR_WIDTH,XF_2UC1,HEIGHT,WIDTH,XF_NPPC32>(dst_mat,img_out);
}
}
// memory mapped kernel
#include "xf_canny_config.h"
extern "C" {
void edgetracing_accel(ap_uint<INPUT_PTR_WIDTH> *img_inp, ap_uint<OUTPUT_PTR_WIDTH> *img_out, int rows, int cols)
{
#pragma HLS INTERFACE m_axi port=img_inp offset=slave bundle=gmem3
#pragma HLS INTERFACE m_axi port=img_out offset=slave bundle=gmem4
#pragma HLS INTERFACE s_axilite port=img_inp bundle=control
#pragma HLS INTERFACE s_axilite port=img_out bundle=control
#pragma HLS INTERFACE s_axilite port=rows bundle=control
#pragma HLS INTERFACE s_axilite port=cols bundle=control
#pragma HLS INTERFACE s_axilite port=return bundle=control
xf::Mat<XF_2UC1, HEIGHT, WIDTH, XF_NPPC32> _dst1(rows,cols,img_inp);
xf::Mat<XF_8UC1, HEIGHT, WIDTH, XF_NPPC8> _dst2(rows,cols,img_out);
xf::EdgeTracing<XF_2UC1,XF_8UC1,HEIGHT, WIDTH, XF_NPPC32,XF_NPPC8,XF_USE_URAM>(_dst1,_dst2);
}
}
Design example Using Library on SDAccel
Following is a multi-kernel example, where different kernel runs sequentially in a pipeline to form an application. This example performs Canny edge detection, where two kernels are involved, Canny and edge tracing. Canny function will take gray-scale image as input and provided the edge information in 3 states(weak edge(1),strong edge(3) and background(0)), which is being fed into edge tracing, which filters out the weak edges. The prior works in a streaming based implementation and the later in a memory mapped manner.
Host code
// setting up device and platform
std::vector<cl::Device> devices = xcl::get_xil_devices();
cl::Device device = devices[0];
cl::Context context(device);
cl::CommandQueue q(context, device,CL_QUEUE_PROFILING_ENABLE);
std::string device_name = device.getInfo<CL_DEVICE_NAME>();
// Kernel 1: Canny
std::string binaryFile=xcl::find_binary_file(device_name,"krnl_canny");
cl::Program::Binaries bins = xcl::import_binary_file(binaryFile);
devices.resize(1);
cl::Program program(context, devices, bins);
cl::Kernel krnl(program,"canny_accel");
// creating necessary cl buffers for input and output
cl::Buffer imageToDevice(context, CL_MEM_READ_ONLY,(height*width));
cl::Buffer imageFromDevice(context, CL_MEM_WRITE_ONLY,(height*width/4));
// Set the kernel arguments
krnl.setArg(0, imageToDevice);
krnl.setArg(1, imageFromDevice);
krnl.setArg(2, height);
krnl.setArg(3, width);
krnl.setArg(4, low_threshold);
krnl.setArg(5, high_threshold);
// write the input image data from host to device memory
q.enqueueWriteBuffer(imageToDevice, CL_TRUE, 0,(height*(width)),img_gray.data);
// Profiling Objects
cl_ulong start= 0;
cl_ulong end = 0;
double diff_prof = 0.0f;
cl::Event event_sp;
// Launch the kernel
q.enqueueTask(krnl,NULL,&event_sp);
clWaitForEvents(1, (const cl_event*) &event_sp);
// profiling
event_sp.getProfilingInfo(CL_PROFILING_COMMAND_START,&start);
event_sp.getProfilingInfo(CL_PROFILING_COMMAND_END,&end);
diff_prof = end-start;
std::cout<<(diff_prof/1000000)<<"ms"<<std::endl;
// Kernel 2: edge tracing
cl::Kernel krnl2(program,"edgetracing_accel");
cl::Buffer imageFromDeviceedge(context, CL_MEM_WRITE_ONLY,(height*width));
// Set the kernel arguments
krnl2.setArg(0, imageFromDevice);
krnl2.setArg(1, imageFromDeviceedge);
krnl2.setArg(2, height);
krnl2.setArg(3, width);
// Profiling Objects
cl_ulong startedge= 0;
cl_ulong endedge = 0;
double diff_prof_edge = 0.0f;
cl::Event event_sp_edge;
// Launch the kernel
q.enqueueTask(krnl2,NULL,&event_sp_edge);
clWaitForEvents(1, (const cl_event*) &event_sp_edge);
// profiling
event_sp_edge.getProfilingInfo(CL_PROFILING_COMMAND_START,&startedge);
event_sp_edge.getProfilingInfo(CL_PROFILING_COMMAND_END,&endedge);
diff_prof_edge = endedge-startedge;
std::cout<<(diff_prof_edge/1000000)<<"ms"<<std::endl;
//Copying Device result data to Host memory
q.enqueueReadBuffer(imageFromDeviceedge, CL_TRUE, 0,(height*width),out_img_edge.data);
q.finish();
Top level kernelBelow is the top-level/wrapper function with all necessary glue logic.
// streaming based kernel
#include "xf_canny_config.h"
extern "C" {
void canny_accel(ap_uint<INPUT_PTR_WIDTH> *img_inp, ap_uint<OUTPUT_PTR_WIDTH> *img_out, int rows, int cols,int low_threshold,int high_threshold)
{
#pragma HLS INTERFACE m_axi port=img_inp offset=slave bundle=gmem1
#pragma HLS INTERFACE m_axi port=img_out offset=slave bundle=gmem2
#pragma HLS INTERFACE s_axilite port=img_inp bundle=control
#pragma HLS INTERFACE s_axilite port=img_out bundle=control
#pragma HLS INTERFACE s_axilite port=rows bundle=control
#pragma HLS INTERFACE s_axilite port=cols bundle=control
#pragma HLS INTERFACE s_axilite port=low_threshold bundle=control
#pragma HLS INTERFACE s_axilite port=high_threshold bundle=control
#pragma HLS INTERFACE s_axilite port=return bundle=control
xf::Mat<XF_8UC1, HEIGHT, WIDTH, INTYPE> in_mat(rows,cols);
#pragma HLS stream variable=in_mat.data depth=2
xf::Mat<XF_2UC1, HEIGHT, WIDTH, XF_NPPC32> dst_mat(rows,cols);
#pragma HLS stream variable=dst_mat.data depth=2
#pragma HLS DATAFLOW
xf::Array2xfMat<INPUT_PTR_WIDTH,XF_8UC1,HEIGHT,WIDTH,INTYPE>(img_inp,in_mat);
xf::Canny<FILTER_WIDTH,NORM_TYPE,XF_8UC1,XF_2UC1,HEIGHT, WIDTH,INTYPE,XF_NPPC32,XF_USE_URAM>(in_mat,dst_mat,low_threshold,high_threshold);
xf::xfMat2Array<OUTPUT_PTR_WIDTH,XF_2UC1,HEIGHT,WIDTH,XF_NPPC32>(dst_mat,img_out);
}
}
// memory mapped kernel
#include "xf_canny_config.h"
extern "C" {
void edgetracing_accel(ap_uint<INPUT_PTR_WIDTH> *img_inp, ap_uint<OUTPUT_PTR_WIDTH> *img_out, int rows, int cols)
{
#pragma HLS INTERFACE m_axi port=img_inp offset=slave bundle=gmem3
#pragma HLS INTERFACE m_axi port=img_out offset=slave bundle=gmem4
#pragma HLS INTERFACE s_axilite port=img_inp bundle=control
#pragma HLS INTERFACE s_axilite port=img_out bundle=control
#pragma HLS INTERFACE s_axilite port=rows bundle=control
#pragma HLS INTERFACE s_axilite port=cols bundle=control
#pragma HLS INTERFACE s_axilite port=return bundle=control
xf::Mat<XF_2UC1, HEIGHT, WIDTH, XF_NPPC32> _dst1(rows,cols,img_inp);
xf::Mat<XF_8UC1, HEIGHT, WIDTH, XF_NPPC8> _dst2(rows,cols,img_out);
xf::EdgeTracing<XF_2UC1,XF_8UC1,HEIGHT, WIDTH, XF_NPPC32,XF_NPPC8,XF_USE_URAM>(_dst1,_dst2);
}
}
Evaluating the Functionality
You can build the kernels and test the functionality through software emulation, hardware emulation, and running directly on a supported hardware with the FPGA. For PCIe based platforms, use the following commands to setup the environment:
$ cd <path to the proj folder, where makefile is present>
$ source <path to the SDx installation folder>/SDx/<version number>/settings64.sh
$ source <path to Xilinx_xrt>/packages/setenv.sh
$ export PLATFORM_PATH=<path to the platform folder>
$ export XLNX_SRC_PATH=<path to the xfOpenCV repo>
$ export XILINX_CL_PATH=/usr
Software Emulation
Software emulation is equivalent to running a C-simulation of the kernel. The time for compilation is minimal, and is therefore recommended to be the first step in testing the kernel. Following are the steps to build and run for the software emulation:
$ make all TARGETS=sw_emu
$ export XCL_EMULATION_MODE=sw_emu
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<sdx installation path>/SDx/2019.1/lnx64/tools/opencv:/usr/lib64
$ ./<executable> <args>
Hardware Emulation
Hardware emulation runs the test on the generated RTL after synthesis of the C/C++ code. The simulation, since being done on RTL requires longer to complete when compared to software emulation. Following are the steps to build and run for the hardware emulation:
$ make all TARGETS=hw_emu
$ export XCL_EMULATION_MODE=hw_emu
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<sdx installation path>/SDx/2019.1/lnx64/tools/opencv:/usr/lib64
$ ./<executable> <args>
Testing on the Hardware
To test on the hardware, the kernel must be compiled into a bitstream (building for hardware).
$ make all TARGETS=hw
This would consume some time since the C/C++ code must be converted to RTL, run through synthesis and implementation process before a bitstream is created. As a prerequisite the drivers has to be installed for corresponding DSA, for which the example was built for. Following are the steps to run the kernel on a hardware:
$ source /opt/xilinx/xrt/setup.sh
$ export XILINX_XRT=/opt/xilinx/xrt
$ cd <path to the executable and the corresponding xclbin>
$ ./<executable> <args>