multiprocessing - issue working with multi processing programming and open mp -
i sending toy program multi-process programming. program working or not can 50% more performance. if enter # pragma parallel program not work anymore. how can improve performace ? how can tell how many processes should run best performance. example, on 4-core or 8-core? :
#include <stdio.h> #include <windows.h> #include <process.h> #include <time.h> #include <stdlib.h> #define x 100000 char matrix[8000*x] ; volatile long barrier = 0 ; unsigned long long start[] ={ 0 , 1000*x+1 , 2000*x+1 , 3000*x+1 , 4000*x+1 , 5000*x+1, 7000*x+1 } ; unsigned long long stop[] ={ 1000*x , 2000*x , 3000*x , 4000*x , 5000*x , 6000*x , 8000*x } ; void init( void *arg1 ) { long ; const long s0 = start[(ulong_ptr)arg1]; const long s1 = stop[(ulong_ptr)arg1]; // #pragma omp parallel <------ *** pragma not work ! *** (i= s0 ; i< s1 ; i++ ) { matrix[i] = 0 ; } ++barrier ; } long main() { register long , zzz; clock_t tempo0 ; clock_t tempo1 ; // ********************************************************#1 printf( "now in main() function.\n" ); tempo0 = clock(); (zzz=0;zzz<100;zzz++) { ( i=0;i<8000*x;i++) matrix[i] = 0 ; } tempo1 = clock(); printf ( "\nsequenziale <%lf>\n" , (double) tempo1-tempo0 ); // return 0 ; // ******************************************************* #2 tempo0 = clock(); (zzz=0;zzz<100;zzz++) { barrier = 0 ; _beginthread( init, 0, (void*) 0 ); _beginthread( init, 0, (void*) 1 ); _beginthread( init, 0, (void*) 2 ); _beginthread( init, 0, (void*) 3 ); _beginthread( init, 0, (void*) 4 ); _beginthread( init, 0, (void*) 5 ); _beginthread( init, 0, (void*) 6 ); _beginthread( init, 0, (void*) 7 ); while ( barrier!=8) ; } tempo1 = clock (); printf ( "\nthread <%lf>\n" , (double) tempo1-tempo0 ); }
thank in advance
first, if @ work put in sequential , threaded section, it's not same. time/performance comparison doesn't make sense:
for (zzz=0;zzz<100;zzz++) { ( i=0;i<8000*x;i++) matrix[i] = 0 ; }
you call 100 * matrix init 0 in threaded code, initialize matrix 0 1 time!
now performance question.
in fact made several error in way want scale , that's why @ end doesn't work when uncomment #pragma omp parallel for
.
imagine have 8 core (1 thread per core) in sample. in loop
for (zzz=0;zzz<100;zzz++) { barrier = 0 ; _beginthread( init, 0, (void*) 0 ); _beginthread( init, 0, (void*) 1 ); _beginthread( init, 0, (void*) 2 ); _beginthread( init, 0, (void*) 3 ); _beginthread( init, 0, (void*) 4 ); _beginthread( init, 0, (void*) 5 ); _beginthread( init, 0, (void*) 6 ); _beginthread( init, 0, (void*) 7 ); while ( barrier!=8) ; }
8 thread running, , if uncomment openmp directive in init
function, ask openmp split job of each init function on multiple (8) threads.
so have theoretically 8x8 thread running concurrently, have 8 core. , thats why doesn't work. performance decrease because of tread context switch!
in fact answer last question, " how can tell how many processes should run best performance?",
- we talk of thread, not of processes!
- the zzz loop coarser granularity, it's loop openmp can use .
- openmp split 100 iterations on core 4, 8 16, etc... free.
so rewrite code below:
tempo0 = clock(); #pragma omp parallel for (int z = 0 ; z < 8000*x ; z++) { matrix[z] = 0; } tempo1 = clock ();
i have 2 core without openmp, took 630 ticks, , 452 2 core , automatically go down more core.
Comments
Post a Comment