c++ - CL_MEM_ALLOC_HOST_PTR slower than CL_MEM_USE_HOST

c++ - CL_MEM_ALLOC_HOST_PTR slower than CL_MEM_USE_HOST_PTR -

- January 15, 2012

so i've been playing around opencl bit , testing speeds of memory transfer between host , device. using intel opencl sdk , running on intel i5 processor integrated graphics. discovered clenqueuemapbuffer instead of clenqueuewritebuffer turned out faster 10 times when using pinned memory so:

int amt = 16*1024*1024; ... k_a = clcreatebuffer(context,cl_mem_read_only | cl_mem_use_host_ptr, sizeof(int)*amt, a, null); k_b = clcreatebuffer(context,cl_mem_read_only | cl_mem_use_host_ptr, sizeof(int)*amt, b, null); k_c = clcreatebuffer(context,cl_mem_write_only | cl_mem_use_host_ptr, sizeof(int)*amt, ret, null);  int* map_a = (int*) clenqueuemapbuffer(c_q, k_a, cl_true, cl_map_read, 0, sizeof(int)*amt, 0, null, null, &error); int* map_b = (int*) clenqueuemapbuffer(c_q, k_b, cl_true, cl_map_read, 0, sizeof(int)*amt, 0, null, null, &error); int* map_c = (int*) clenqueuemapbuffer(c_q, k_c, cl_true, cl_map_write, 0, sizeof(int)*amt, 0, null, null, &error); clfinish(c_q);

where a b , ret 128 bit aligned int arrays. time came out 22.026186 ms, compared 198.604528 ms using clenqueuewritebuffer however, when changed code

k_a = clcreatebuffer(context,cl_mem_read_only | cl_mem_alloc_host_ptr, sizeof(int)*amt, null, null); k_b = clcreatebuffer(context,cl_mem_read_only | cl_mem_alloc_host_ptr, sizeof(int)*amt, null, null); k_c = clcreatebuffer(context,cl_mem_write_only | cl_mem_alloc_host_ptr, sizeof(int)*amt, null, null);  int* map_a = (int*)clenqueuemapbuffer(c_q, k_a, cl_true, cl_map_read, 0, sizeof(int)*amt, 0, null, null, &error); int* map_b = (int*)clenqueuemapbuffer(c_q, k_b, cl_true, cl_map_read, 0, sizeof(int)*amt, 0, null, null, &error); int* map_c = (int*)clenqueuemapbuffer(c_q, k_c, cl_true, cl_map_write, 0, sizeof(int)*amt, 0, null, null, &error);  /** initiate map_a , map_b **/

the time increases 91.350065 ms

what problem? or problem @ all?

edit: how initialize arrays in second code:

for (int = 0; < amt; i++) {     map_a[i] = i;     map_b[i] = i; }

and check, map_a , map_b do contain right elements @ end of program, map_c contains 0's. did this:

clenqueueunmapmemobject(c_q, k_a, map_a, 0, null, null); clenqueueunmapmemobject(c_q, k_b, map_b, 0, null, null); clenqueueunmapmemobject(c_q, k_c, map_c, 0, null, null);

and kernel just

__kernel void test(__global int* a, __global int* b, __global int* c) {     int = get_global_id(0);     c[i] = a[i] + b[i]; }

my understanding cl_mem_alloc_host_ptr allocates doesn't copy. 2nd block of code data onto device?

also, clcreatebuffer when used cl_mem_use_host_ptr , cl_mem_copy_host_ptr shouldn't require clenqueuewrite, buffer created memory pointed void *host_ptr.

using "pinned" memory in opencl should process like:

   int amt = 16*1024*1024;    int array[] = new int[amt];    int error = 0;      //note, since using null data pointer, have use cl_mem_alloc_host_ptr     //this allocates memory on devices     cl_mem b1 = clcreatebuffer(context, cl_mem_read_write | cl_mem_alloc_host_ptr, sizeof(int)*amt, null, &error);       //map device memory host memory, aka pinning     int *host_ptr = clenqueuemapbuffer(queue, b1, cl_true, cl_map_read | cl_map_write, 0, sizeof(int)*amt, 0, null, null, &error);       //copy host memory pinned host memory copies card automatically`     memcpy(host_ptr, array, sizeof(int)*amt);       //call kernel , else , memcpy pinned host when     //you done

edit: 1 final thing can speed program not make memory read/write blocking using cl_false instead of cl_true. make sure call clfinish() before data gets copied host command queue emptied , commands processed.

source: opencl in action

Search This Blog

Sharma

c++ - CL_MEM_ALLOC_HOST_PTR slower than CL_MEM_USE_HOST_PTR -

Comments

Post a Comment

Popular posts from this blog

c# - must be a non-abstract type with a public parameterless constructor in redis -

c# - ReportViewer control - axd url -

ajax - PHP/JSON Login script (Twitter style) not setting sessions -