Getting Started with SDAccel

Host Code with OpenCL

Host code is compiled for the host machine that runs on the host and provides the data and control signals to the attached hardware with the FPGA. The host code is written using OpenCL constructs and provides capabilities for setting up, and running a kernel on the FPGA. The following functions are executed using the host code:

Loading the kernel binary on the FPGA – xcl::import_binary_file() loads the bitstream and programs the FPGA to enable required processing of data.
Setting up memory buffers for data transfer – Data needs to be sent and read from the DDR memory on the hardware. cl::Buffers are created to allocate required memory for transferring data to and from the hardware.
Transfer data to and from the hardware –enqueueWriteBuffer() and enqueueReadBuffer() are used to transfer the data to and from the hardware at the required time.
Execute kernel on the FPGA – There are functions to execute kernels on the FPGA. There can be single kernel execution or multiple kernel execution that could be asynchronous or synchronous with each other. Commonly used command is enqueueTask().
Profiling the performance of kernel execution – The host code in OpenCL also enables measurement of the execution time of a kernel on the FPGA. The function used in our examples for profiling is getProfilingInfo().

Wrappers around HLS Kernel(s)

All xfOpenCV kernels are provided with C++ function templates (located at <Github repo>/include) with image containers as objects of xf::Mat class. In addition, these kernels will work either in stream based (where complete image is read continuously) or memory mapped (where image data access is in blocks).

SDAccel flow (OpenCL) requires kernel interfaces to be memory pointers with width in power(s) of 2. So glue logic is required for converting memory pointers to xf::Mat class data type and vice-versa when interacting with xfOpenCV kernel(s). Wrapper(s) are build over the kernel(s) with this glue logic. Below examples will provide a methodology to handle different kernel (xfOpenCV kernels located at <Github repo>/include) types (stream and memory mapped).

Stream Based Kernels

To facilitate the conversion of pointer to xf::Mat and vice versa, two adapter functions are included as part of xfOpenCV xf::Array2xfMat() and xf::xfMat2Array().It is necessary for the xf::Mat objects to be invoked as streams using HLS pragma with a minimum depth of 2. This results in a top-level (or wrapper) function for the kernel as shown below:

extern “C” 
{ 
void func_top (ap_uint *gmem_in, ap_uint *gmem_out, ...) { 
xf::Mat<…> in_mat(…), out_mat(…);
#pragma HLS stream variable=in_mat.data depth=2
#pragma HLS stream variable=out_mat.data depth=2
#pragma HLS dataflow 
xf::Array2xfMat<…> (gmem_in, in_mat); 
xf::xfopencv-func<…> (in_mat, out_mat…); 
xf::xfMat2Array<…> (gmem_out, out_mat); 
}
}

The above illustration assumes that the data in xf::Mat is being streamed in and streamed out. You can also create a pipeline with multiple functions in pipeline instead of just one xfopencv function.

For the stream based kernels with different inputs of different sizes, multiple instances of the adapter functions are necessary. For this,

extern “C” { 
void func_top (ap_uint *gmem_in1, ap_uint *gmem_in2, ap_uint *gmem_in3, ap_uint *gmem_out, ...) { 
xf::Mat<...,HEIGHT,WIDTH,…> in_mat1(…), out_mat(…);
xf::Mat<...,HEIGHT/4,WIDTH,…>  in_mat2(…), in_mat3(…); 
#pragma HLS stream variable=in_mat1.data depth=2
#pragma HLS stream variable=in_mat2.data depth=2
#pragma HLS stream variable=in_mat3.data depth=2
#pragma HLS stream variable=out_mat.data depth=2
#pragma HLS dataflow 
xf::accel_utils obj_a, obj_b;
obj_a.Array2xfMat<…,HEIGHT,WIDTH,…> (gmem_in1, in_mat1);
obj_b.Array2xfMat<…,HEIGHT/4,WIDTH,…> (gmem_in2, in_mat2); 
obj_b.Array2xfMat<…,HEIGHT/4,WIDTH,…> (gmem_in3, in_mat3); 
xf::xfopencv-func(in_mat1, in_mat2, int_mat3, out_mat…); 
xf::xfMat2Array<…> (gmem_out, out_mat); 
}
}

For the stream based implementations, the data must be fetched from the input AXI and must be pushed to xfMat as required by the xfcv kernels for that particular configuration. Likewise, the same operations must be performed for the output of the xfcv kernel. To perform this, two utility functions are provided, xf::Array2xfMat() and xf::xfMat2Array().

Stream Based Kernels

To facilitate the conversion of pointer to xf::Mat and vice versa, two adapter functions are included as part of xfOpenCV xf::Array2xfMat() and xf::xfMat2Array().It is necessary for the xf::Mat objects to be invoked as streams using HLS pragma with a minimum depth of 2. This results in a top-level (or wrapper) function for the kernel as shown below:

extern “C” 
{ 
void func_top (ap_uint *gmem_in, ap_uint *gmem_out, ...) { 
xf::Mat<…> in_mat(…), out_mat(…);
#pragma HLS stream variable=in_mat.data depth=2
#pragma HLS stream variable=out_mat.data depth=2
#pragma HLS dataflow 
xf::Array2xfMat<…> (gmem_in, in_mat); 
xf::xfopencv-func<…> (in_mat, out_mat…); 
xf::xfMat2Array<…> (gmem_out, out_mat); 
}
}

The above illustration assumes that the data in xf::Mat is being streamed in and streamed out. You can also create a pipeline with multiple functions in pipeline instead of just one xfopencv function.

For the stream based kernels with different inputs of different sizes, multiple instances of the adapter functions are necessary. For this,

extern “C” { 
void func_top (ap_uint *gmem_in1, ap_uint *gmem_in2, ap_uint *gmem_in3, ap_uint *gmem_out, ...) { 
xf::Mat<...,HEIGHT,WIDTH,…> in_mat1(…), out_mat(…);
xf::Mat<...,HEIGHT/4,WIDTH,…>  in_mat2(…), in_mat3(…); 
#pragma HLS stream variable=in_mat1.data depth=2
#pragma HLS stream variable=in_mat2.data depth=2
#pragma HLS stream variable=in_mat3.data depth=2
#pragma HLS stream variable=out_mat.data depth=2
#pragma HLS dataflow 
xf::accel_utils obj_a, obj_b;
obj_a.Array2xfMat<…,HEIGHT,WIDTH,…> (gmem_in1, in_mat1);
obj_b.Array2xfMat<…,HEIGHT/4,WIDTH,…> (gmem_in2, in_mat2); 
obj_b.Array2xfMat<…,HEIGHT/4,WIDTH,…> (gmem_in3, in_mat3); 
xf::xfopencv-func(in_mat1, in_mat2, int_mat3, out_mat…); 
xf::xfMat2Array<…> (gmem_out, out_mat); 
}
}

For the stream based implementations, the data must be fetched from the input AXI and must be pushed to xfMat as required by the xfcv kernels for that particular configuration. Likewise, the same operations must be performed for the output of the xfcv kernel. To perform this, two utility functions are provided, xf::Array2xfMat() and xf::xfMat2Array().

xfMat2Array

This function converts the input xf::Mat to output array. The output of the xf::kernel function will be xf::Mat, and it will require to convert that to output pointer.

template <int PTR_WIDTH, int MAT_T, int ROWS, int COLS, int NPC>
void xfMat2Array(xf::Mat<MAT_T,ROWS,COLS,NPC>& srcMat, ap_uint< PTR_WIDTH > *dstPtr)

Table 1. xfMat2Array Parameter Description
Parameter	Description
PTR_WIDTH	Data width of the output pointer. The value must be power 2, from 8 to 512.
MAT_T	Input Mat type. Example XF_8UC1, XF_16UC1, XF_8UC3 and XF_8UC4
ROWS	Maximum height of image
COLS	Maximum width of image
NPC	Number of pixels computed in parallel. Example XF_NPPC1, XF_NPPC8
dstPtr	Output pointer. Type of the pointer based on the PTR_WIDTH.
srcMat	Input image of type xf::Mat

Interface pointer widths

Minimum pointer widths for different configurations is shown in the following table:

Table 2. Minimum and maximum pointer widths for different mat types
MAT type	Parallelism	Min PTR_WIDTH	Max PTR_WIDTH
XF_8UC1	XF_NPPC1	8	512
XF_16UC1	XF_NPPC1	16	512
XF_ 8UC1	XF_NPPC8	64	512
XF_ 16UC1	XF_NPPC8	128	512
XF_ 8UC3	XF_NPPC1	32	512
XF_ 8UC3	XF_NPPC8	256	512
XF_8UC4	XF_NPPC8	256	512
XF_8UC3	XF_NPPC16	512	512

Stream Based Kernels

To facilitate the conversion of pointer to xf::Mat and vice versa, two adapter functions are included as part of xfOpenCV xf::Array2xfMat() and xf::xfMat2Array().It is necessary for the xf::Mat objects to be invoked as streams using HLS pragma with a minimum depth of 2. This results in a top-level (or wrapper) function for the kernel as shown below:

extern “C” 
{ 
void func_top (ap_uint *gmem_in, ap_uint *gmem_out, ...) { 
xf::Mat<…> in_mat(…), out_mat(…);
#pragma HLS stream variable=in_mat.data depth=2
#pragma HLS stream variable=out_mat.data depth=2
#pragma HLS dataflow 
xf::Array2xfMat<…> (gmem_in, in_mat); 
xf::xfopencv-func<…> (in_mat, out_mat…); 
xf::xfMat2Array<…> (gmem_out, out_mat); 
}
}

The above illustration assumes that the data in xf::Mat is being streamed in and streamed out. You can also create a pipeline with multiple functions in pipeline instead of just one xfopencv function.

For the stream based kernels with different inputs of different sizes, multiple instances of the adapter functions are necessary. For this,

extern “C” { 
void func_top (ap_uint *gmem_in1, ap_uint *gmem_in2, ap_uint *gmem_in3, ap_uint *gmem_out, ...) { 
xf::Mat<...,HEIGHT,WIDTH,…> in_mat1(…), out_mat(…);
xf::Mat<...,HEIGHT/4,WIDTH,…>  in_mat2(…), in_mat3(…); 
#pragma HLS stream variable=in_mat1.data depth=2
#pragma HLS stream variable=in_mat2.data depth=2
#pragma HLS stream variable=in_mat3.data depth=2
#pragma HLS stream variable=out_mat.data depth=2
#pragma HLS dataflow 
xf::accel_utils obj_a, obj_b;
obj_a.Array2xfMat<…,HEIGHT,WIDTH,…> (gmem_in1, in_mat1);
obj_b.Array2xfMat<…,HEIGHT/4,WIDTH,…> (gmem_in2, in_mat2); 
obj_b.Array2xfMat<…,HEIGHT/4,WIDTH,…> (gmem_in3, in_mat3); 
xf::xfopencv-func(in_mat1, in_mat2, int_mat3, out_mat…); 
xf::xfMat2Array<…> (gmem_out, out_mat); 
}
}

For the stream based implementations, the data must be fetched from the input AXI and must be pushed to xfMat as required by the xfcv kernels for that particular configuration. Likewise, the same operations must be performed for the output of the xfcv kernel. To perform this, two utility functions are provided, xf::Array2xfMat() and xf::xfMat2Array().

Design example Using Library on SDAccel

Following is a multi-kernel example, where different kernel runs sequentially in a pipeline to form an application. This example performs Canny edge detection, where two kernels are involved, Canny and edge tracing. Canny function will take gray-scale image as input and provided the edge information in 3 states(weak edge(1),strong edge(3) and background(0)), which is being fed into edge tracing, which filters out the weak edges. The prior works in a streaming based implementation and the later in a memory mapped manner.

Host code

The following is the Host code for the canny edge detection example. The host code sets up the OpenCL platform with the FPGA of processing required data. In the case of xfOpenCV example, the data is an image. Reading and writing of images are enabled using called to functions from xfOpenCV.

// setting up device and platform
	std::vector<cl::Device> devices = xcl::get_xil_devices();
	cl::Device device = devices[0];
	cl::Context context(device);
	cl::CommandQueue q(context, device,CL_QUEUE_PROFILING_ENABLE);
	std::string device_name = device.getInfo<CL_DEVICE_NAME>();

	// Kernel 1: Canny
	std::string binaryFile=xcl::find_binary_file(device_name,"krnl_canny");
	cl::Program::Binaries bins = xcl::import_binary_file(binaryFile);
	devices.resize(1);
	cl::Program program(context, devices, bins);
	cl::Kernel krnl(program,"canny_accel");

	// creating necessary cl buffers for input and output
	cl::Buffer imageToDevice(context, CL_MEM_READ_ONLY,(height*width));
	cl::Buffer imageFromDevice(context, CL_MEM_WRITE_ONLY,(height*width/4));


	// Set the kernel arguments
	krnl.setArg(0, imageToDevice);
	krnl.setArg(1, imageFromDevice);
	krnl.setArg(2, height);
	krnl.setArg(3, width);
	krnl.setArg(4, low_threshold);
	krnl.setArg(5, high_threshold);

	// write the input image data from host to device memory
	q.enqueueWriteBuffer(imageToDevice, CL_TRUE, 0,(height*(width)),img_gray.data);
	// Profiling Objects
	cl_ulong start= 0;
	cl_ulong end = 0;
	double diff_prof = 0.0f;
	cl::Event event_sp;

	// Launch the kernel
	q.enqueueTask(krnl,NULL,&event_sp);
	clWaitForEvents(1, (const cl_event*) &event_sp);

	// profiling
	event_sp.getProfilingInfo(CL_PROFILING_COMMAND_START,&start);
	event_sp.getProfilingInfo(CL_PROFILING_COMMAND_END,&end);
	diff_prof = end-start;
	std::cout<<(diff_prof/1000000)<<"ms"<<std::endl;

	// Kernel 2: edge tracing
	cl::Kernel krnl2(program,"edgetracing_accel");

	cl::Buffer imageFromDeviceedge(context, CL_MEM_WRITE_ONLY,(height*width));

	// Set the kernel arguments
	krnl2.setArg(0, imageFromDevice);
	krnl2.setArg(1, imageFromDeviceedge);
	krnl2.setArg(2, height);
	krnl2.setArg(3, width);
	
	// Profiling Objects
	cl_ulong startedge= 0;
	cl_ulong endedge = 0;
	double diff_prof_edge = 0.0f;
	cl::Event event_sp_edge;

	// Launch the kernel
	q.enqueueTask(krnl2,NULL,&event_sp_edge);
	clWaitForEvents(1, (const cl_event*) &event_sp_edge);

	// profiling
	event_sp_edge.getProfilingInfo(CL_PROFILING_COMMAND_START,&startedge);
	event_sp_edge.getProfilingInfo(CL_PROFILING_COMMAND_END,&endedge);
	diff_prof_edge = endedge-startedge;
	std::cout<<(diff_prof_edge/1000000)<<"ms"<<std::endl;

	
	//Copying Device result data to Host memory
	q.enqueueReadBuffer(imageFromDeviceedge, CL_TRUE, 0,(height*width),out_img_edge.data);
	q.finish();

Top level kernel

Below is the top-level/wrapper function with all necessary glue logic.

// streaming based kernel
#include "xf_canny_config.h"

extern "C" {
void canny_accel(ap_uint<INPUT_PTR_WIDTH> *img_inp, ap_uint<OUTPUT_PTR_WIDTH> *img_out, int rows, int cols,int low_threshold,int high_threshold)
{
#pragma HLS INTERFACE m_axi     port=img_inp  offset=slave bundle=gmem1
#pragma HLS INTERFACE m_axi     port=img_out  offset=slave bundle=gmem2
#pragma HLS INTERFACE s_axilite port=img_inp  bundle=control
#pragma HLS INTERFACE s_axilite port=img_out  bundle=control

#pragma HLS INTERFACE s_axilite port=rows     bundle=control
#pragma HLS INTERFACE s_axilite port=cols     bundle=control
#pragma HLS INTERFACE s_axilite port=low_threshold     bundle=control
#pragma HLS INTERFACE s_axilite port=high_threshold     bundle=control
#pragma HLS INTERFACE s_axilite port=return   bundle=control

	xf::Mat<XF_8UC1, HEIGHT, WIDTH, INTYPE> in_mat(rows,cols);
#pragma HLS stream variable=in_mat.data depth=2
	
	xf::Mat<XF_2UC1, HEIGHT, WIDTH, XF_NPPC32> dst_mat(rows,cols);
#pragma HLS stream variable=dst_mat.data depth=2
	
	
	#pragma HLS DATAFLOW 

	xf::Array2xfMat<INPUT_PTR_WIDTH,XF_8UC1,HEIGHT,WIDTH,INTYPE>(img_inp,in_mat);
	xf::Canny<FILTER_WIDTH,NORM_TYPE,XF_8UC1,XF_2UC1,HEIGHT, WIDTH,INTYPE,XF_NPPC32,XF_USE_URAM>(in_mat,dst_mat,low_threshold,high_threshold);
	xf::xfMat2Array<OUTPUT_PTR_WIDTH,XF_2UC1,HEIGHT,WIDTH,XF_NPPC32>(dst_mat,img_out);
	
	
}
}
// memory mapped kernel
#include "xf_canny_config.h"
extern "C" {
void edgetracing_accel(ap_uint<INPUT_PTR_WIDTH> *img_inp, ap_uint<OUTPUT_PTR_WIDTH> *img_out, int rows, int cols)
{
#pragma HLS INTERFACE m_axi     port=img_inp  offset=slave bundle=gmem3
#pragma HLS INTERFACE m_axi     port=img_out  offset=slave bundle=gmem4
#pragma HLS INTERFACE s_axilite port=img_inp  bundle=control
#pragma HLS INTERFACE s_axilite port=img_out  bundle=control

#pragma HLS INTERFACE s_axilite port=rows     bundle=control
#pragma HLS INTERFACE s_axilite port=cols     bundle=control
#pragma HLS INTERFACE s_axilite port=return   bundle=control


	xf::Mat<XF_2UC1, HEIGHT, WIDTH, XF_NPPC32> _dst1(rows,cols,img_inp);
	xf::Mat<XF_8UC1, HEIGHT, WIDTH, XF_NPPC8> _dst2(rows,cols,img_out);
	xf::EdgeTracing<XF_2UC1,XF_8UC1,HEIGHT, WIDTH, XF_NPPC32,XF_NPPC8,XF_USE_URAM>(_dst1,_dst2);
	
}
}

Design example Using Library on SDAccel

Following is a multi-kernel example, where different kernel runs sequentially in a pipeline to form an application. This example performs Canny edge detection, where two kernels are involved, Canny and edge tracing. Canny function will take gray-scale image as input and provided the edge information in 3 states(weak edge(1),strong edge(3) and background(0)), which is being fed into edge tracing, which filters out the weak edges. The prior works in a streaming based implementation and the later in a memory mapped manner.

Host code

The following is the Host code for the canny edge detection example. The host code sets up the OpenCL platform with the FPGA of processing required data. In the case of xfOpenCV example, the data is an image. Reading and writing of images are enabled using called to functions from xfOpenCV.

// setting up device and platform
	std::vector<cl::Device> devices = xcl::get_xil_devices();
	cl::Device device = devices[0];
	cl::Context context(device);
	cl::CommandQueue q(context, device,CL_QUEUE_PROFILING_ENABLE);
	std::string device_name = device.getInfo<CL_DEVICE_NAME>();

	// Kernel 1: Canny
	std::string binaryFile=xcl::find_binary_file(device_name,"krnl_canny");
	cl::Program::Binaries bins = xcl::import_binary_file(binaryFile);
	devices.resize(1);
	cl::Program program(context, devices, bins);
	cl::Kernel krnl(program,"canny_accel");

	// creating necessary cl buffers for input and output
	cl::Buffer imageToDevice(context, CL_MEM_READ_ONLY,(height*width));
	cl::Buffer imageFromDevice(context, CL_MEM_WRITE_ONLY,(height*width/4));


	// Set the kernel arguments
	krnl.setArg(0, imageToDevice);
	krnl.setArg(1, imageFromDevice);
	krnl.setArg(2, height);
	krnl.setArg(3, width);
	krnl.setArg(4, low_threshold);
	krnl.setArg(5, high_threshold);

	// write the input image data from host to device memory
	q.enqueueWriteBuffer(imageToDevice, CL_TRUE, 0,(height*(width)),img_gray.data);
	// Profiling Objects
	cl_ulong start= 0;
	cl_ulong end = 0;
	double diff_prof = 0.0f;
	cl::Event event_sp;

	// Launch the kernel
	q.enqueueTask(krnl,NULL,&event_sp);
	clWaitForEvents(1, (const cl_event*) &event_sp);

	// profiling
	event_sp.getProfilingInfo(CL_PROFILING_COMMAND_START,&start);
	event_sp.getProfilingInfo(CL_PROFILING_COMMAND_END,&end);
	diff_prof = end-start;
	std::cout<<(diff_prof/1000000)<<"ms"<<std::endl;

	// Kernel 2: edge tracing
	cl::Kernel krnl2(program,"edgetracing_accel");

	cl::Buffer imageFromDeviceedge(context, CL_MEM_WRITE_ONLY,(height*width));

	// Set the kernel arguments
	krnl2.setArg(0, imageFromDevice);
	krnl2.setArg(1, imageFromDeviceedge);
	krnl2.setArg(2, height);
	krnl2.setArg(3, width);
	
	// Profiling Objects
	cl_ulong startedge= 0;
	cl_ulong endedge = 0;
	double diff_prof_edge = 0.0f;
	cl::Event event_sp_edge;

	// Launch the kernel
	q.enqueueTask(krnl2,NULL,&event_sp_edge);
	clWaitForEvents(1, (const cl_event*) &event_sp_edge);

	// profiling
	event_sp_edge.getProfilingInfo(CL_PROFILING_COMMAND_START,&startedge);
	event_sp_edge.getProfilingInfo(CL_PROFILING_COMMAND_END,&endedge);
	diff_prof_edge = endedge-startedge;
	std::cout<<(diff_prof_edge/1000000)<<"ms"<<std::endl;

	
	//Copying Device result data to Host memory
	q.enqueueReadBuffer(imageFromDeviceedge, CL_TRUE, 0,(height*width),out_img_edge.data);
	q.finish();

Top level kernel

Below is the top-level/wrapper function with all necessary glue logic.

// streaming based kernel
#include "xf_canny_config.h"

extern "C" {
void canny_accel(ap_uint<INPUT_PTR_WIDTH> *img_inp, ap_uint<OUTPUT_PTR_WIDTH> *img_out, int rows, int cols,int low_threshold,int high_threshold)
{
#pragma HLS INTERFACE m_axi     port=img_inp  offset=slave bundle=gmem1
#pragma HLS INTERFACE m_axi     port=img_out  offset=slave bundle=gmem2
#pragma HLS INTERFACE s_axilite port=img_inp  bundle=control
#pragma HLS INTERFACE s_axilite port=img_out  bundle=control

#pragma HLS INTERFACE s_axilite port=rows     bundle=control
#pragma HLS INTERFACE s_axilite port=cols     bundle=control
#pragma HLS INTERFACE s_axilite port=low_threshold     bundle=control
#pragma HLS INTERFACE s_axilite port=high_threshold     bundle=control
#pragma HLS INTERFACE s_axilite port=return   bundle=control

	xf::Mat<XF_8UC1, HEIGHT, WIDTH, INTYPE> in_mat(rows,cols);
#pragma HLS stream variable=in_mat.data depth=2
	
	xf::Mat<XF_2UC1, HEIGHT, WIDTH, XF_NPPC32> dst_mat(rows,cols);
#pragma HLS stream variable=dst_mat.data depth=2
	
	
	#pragma HLS DATAFLOW 

	xf::Array2xfMat<INPUT_PTR_WIDTH,XF_8UC1,HEIGHT,WIDTH,INTYPE>(img_inp,in_mat);
	xf::Canny<FILTER_WIDTH,NORM_TYPE,XF_8UC1,XF_2UC1,HEIGHT, WIDTH,INTYPE,XF_NPPC32,XF_USE_URAM>(in_mat,dst_mat,low_threshold,high_threshold);
	xf::xfMat2Array<OUTPUT_PTR_WIDTH,XF_2UC1,HEIGHT,WIDTH,XF_NPPC32>(dst_mat,img_out);
	
	
}
}
// memory mapped kernel
#include "xf_canny_config.h"
extern "C" {
void edgetracing_accel(ap_uint<INPUT_PTR_WIDTH> *img_inp, ap_uint<OUTPUT_PTR_WIDTH> *img_out, int rows, int cols)
{
#pragma HLS INTERFACE m_axi     port=img_inp  offset=slave bundle=gmem3
#pragma HLS INTERFACE m_axi     port=img_out  offset=slave bundle=gmem4
#pragma HLS INTERFACE s_axilite port=img_inp  bundle=control
#pragma HLS INTERFACE s_axilite port=img_out  bundle=control

#pragma HLS INTERFACE s_axilite port=rows     bundle=control
#pragma HLS INTERFACE s_axilite port=cols     bundle=control
#pragma HLS INTERFACE s_axilite port=return   bundle=control


	xf::Mat<XF_2UC1, HEIGHT, WIDTH, XF_NPPC32> _dst1(rows,cols,img_inp);
	xf::Mat<XF_8UC1, HEIGHT, WIDTH, XF_NPPC8> _dst2(rows,cols,img_out);
	xf::EdgeTracing<XF_2UC1,XF_8UC1,HEIGHT, WIDTH, XF_NPPC32,XF_NPPC8,XF_USE_URAM>(_dst1,_dst2);
	
}
}

Getting Started with SDAccel

Prerequisites

SDAccel Design Methodology

Host Code with OpenCL

Wrappers around HLS Kernel(s)

Stream Based Kernels

Stream Based Kernels

xfMat2Array

Stream Based Kernels

Design example Using Library on SDAccel

Design example Using Library on SDAccel

Evaluating the Functionality

Software Emulation

Hardware Emulation

Testing on the Hardware