multiprocessing - issue working with multi processing programming and open mp -


i sending toy program multi-process programming. program working or not can 50% more performance. if enter # pragma parallel program not work anymore. how can improve performace ? how can tell how many processes should run best performance. example, on 4-core or 8-core? :

#include <stdio.h> #include <windows.h> #include <process.h>           #include <time.h> #include <stdlib.h>  #define x 100000  char matrix[8000*x] ; volatile long barrier = 0 ;  unsigned long long start[] ={ 0         ,   1000*x+1    ,   2000*x+1    ,   3000*x+1    ,   4000*x+1    ,   5000*x+1,   7000*x+1 } ; unsigned long long stop[]  ={ 1000*x    ,   2000*x      ,   3000*x      ,   4000*x      ,   5000*x      ,   6000*x  ,   8000*x   } ;   void  init( void *arg1 )   {   long ;   const long s0 = start[(ulong_ptr)arg1];   const long s1 = stop[(ulong_ptr)arg1];    // #pragma omp parallel <------ *** pragma not work ! ***   (i= s0 ; i< s1 ; i++ )   {     matrix[i] = 0 ;   }   ++barrier ;  }  long main() {    register long , zzz;    clock_t tempo0 ;    clock_t tempo1 ;     // ********************************************************#1     printf( "now in main() function.\n" );     tempo0 = clock();     (zzz=0;zzz<100;zzz++)     {                   ( i=0;i<8000*x;i++)                     matrix[i] = 0 ;     }     tempo1 = clock();     printf ( "\nsequenziale <%lf>\n" , (double) tempo1-tempo0 );      //  return 0 ;     // ******************************************************* #2     tempo0 = clock();     (zzz=0;zzz<100;zzz++)     {            barrier = 0 ;         _beginthread( init, 0, (void*) 0 );         _beginthread( init, 0, (void*) 1 );         _beginthread( init, 0, (void*) 2 );         _beginthread( init, 0, (void*) 3 );         _beginthread( init, 0, (void*) 4 );         _beginthread( init, 0, (void*) 5 );         _beginthread( init, 0, (void*) 6 );         _beginthread( init, 0, (void*) 7 );          while ( barrier!=8)          ;     }      tempo1 = clock ();      printf ( "\nthread <%lf>\n" , (double) tempo1-tempo0 );   } 

thank in advance

first, if @ work put in sequential , threaded section, it's not same. time/performance comparison doesn't make sense:

for (zzz=0;zzz<100;zzz++) {               ( i=0;i<8000*x;i++)                 matrix[i] = 0 ; } 

you call 100 * matrix init 0 in threaded code, initialize matrix 0 1 time!

now performance question.

in fact made several error in way want scale , that's why @ end doesn't work when uncomment #pragma omp parallel for.

imagine have 8 core (1 thread per core) in sample. in loop

for (zzz=0;zzz<100;zzz++) {        barrier = 0 ;     _beginthread( init, 0, (void*) 0 );     _beginthread( init, 0, (void*) 1 );     _beginthread( init, 0, (void*) 2 );     _beginthread( init, 0, (void*) 3 );     _beginthread( init, 0, (void*) 4 );     _beginthread( init, 0, (void*) 5 );     _beginthread( init, 0, (void*) 6 );     _beginthread( init, 0, (void*) 7 );      while ( barrier!=8)      ; } 

8 thread running, , if uncomment openmp directive in initfunction, ask openmp split job of each init function on multiple (8) threads.

so have theoretically 8x8 thread running concurrently, have 8 core. , thats why doesn't work. performance decrease because of tread context switch!

in fact answer last question, " how can tell how many processes should run best performance?",

  1. we talk of thread, not of processes!
  2. the zzz loop coarser granularity, it's loop openmp can use .
  3. openmp split 100 iterations on core 4, 8 16, etc... free.

so rewrite code below:

tempo0 = clock(); #pragma omp parallel for (int z = 0 ; z < 8000*x ; z++) {   matrix[z] = 0; } tempo1 = clock (); 

i have 2 core without openmp, took 630 ticks, , 452 2 core , automatically go down more core.


Comments

Popular posts from this blog

html5 - What is breaking my page when printing? -

c# - must be a non-abstract type with a public parameterless constructor in redis -

ajax - PHP/JSON Login script (Twitter style) not setting sessions -