algorithm - how to make code on openMP work on xeon phi? -


everyone. hope me. have code parallel prim's algorithm on openmp , need make work on xeon phi. please, me. cannot understand how this. here code on openmp.

void parallelprim(double *pmatrix, ttreenode** pminspanningtree, int size) {     int lastadded;     tgraphnode nearestnode;     tgraphnode **notinminspanningtree = new tgraphnode* [size-1];     lastadded = 0;  for(int = 0; < size-1; i++) {     notinminspanningtree[i] = new tgraphnode;     notinminspanningtree[i]->nodenum = i+1;     notinminspanningtree[i]->distance = -1.0f;     notinminspanningtree[i]->parentnodenum = -1; }  for(int iter = 1; iter < size; iter++) {     #pragma omp parallel          for(int = 0; < size-1; i++)             if(notinminspanningtree[i] != null)             {                 double t1 = notinminspanningtree[i]->distance;                 double t2 = pmatrix[(notinminspanningtree[i]->nodenum) * size + lastadded];                 if(((t1 < 0) && (t2 > 0)) || (t1>0) && (t2 > 0) && (t1 > t2))                 {                     notinminspanningtree[i]->distance = t2;                     notinminspanningtree[i]->parentnodenum = lastadded;                 }             }          nearestnode.nodenum = -1;         nearestnode.distance = 3000; #pragma omp parallel         {             tgraphnode threadnearestnode;             threadnearestnode.nodenum = -1;             threadnearestnode.distance = 3000;             #pragma omp                  for(int = 0; < size-1; i++)                 {                 if(notinminspanningtree[i] != null)                 {                     double t1 = notinminspanningtree[i]->distance;                     double t2 = threadnearestnode.distance;                     if((t1 > 0) && (t1 < t2) )                     {                         threadnearestnode.distance = t1;                         threadnearestnode.nodenum = notinminspanningtree[i]->nodenum;                     }                 }             } #pragma omp critical             {                 if(threadnearestnode.distance < nearestnode.distance)                 {                     nearestnode.distance = threadnearestnode.distance;                     nearestnode.nodenum = threadnearestnode.nodenum;                 }             }         }         pminspanningtree[nearestnode.nodenum] = new ttreenode;          pminspanningtree[nearestnode.nodenum]->nodenum = notinminspanningtree[nearestnode.nodenum-1]->parentnodenum;         pminspanningtree[nearestnode.nodenum]->distance = nearestnode.distance;          int parent = notinminspanningtree[nearestnode.nodenum-1]->parentnodenum;         if(pminspanningtree[parent] != null)         {            ttreenode *tmp = new ttreenode;             tmp->distance = nearestnode.distance;             tmp->nodenum = nearestnode.nodenum;         }         else         {             pminspanningtree[parent] = new ttreenode;             pminspanningtree[parent]->distance = nearestnode.distance;             pminspanningtree[parent]->nodenum = nearestnode.nodenum;         }         lastadded = nearestnode.nodenum;         delete notinminspanningtree[nearestnode.nodenum - 1];         notinminspanningtree[nearestnode.nodenum - 1] = null;     }     delete[] notinminspanningtree; } 

you have 2 basic options running code on intel xeon phi coprocessor. can compile entire program -mmic , -qopenmp flags , run using either micnativeloadex or copying executable , required libraries on coprocessor scp. alternately, can omit -mmic , instead modify code portion of code want run on coprocessor falls inside offload section portion of code sent coprocessor run , remainder of code run on host.

the presentation avi sent lovely overview of programming coprocessor. in addition can find basic information on compiling , optimizing coprocessor at: https://software.intel.com/en-us/articles/programming-and-compiling-for-intel-many-integrated-core-architecture.

however, , big however, code not vectorize , has significant serial sections. obtain optimum performance on coprocessor, code must both vectorize , parallelize.


Comments

Popular posts from this blog

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -

php - CakePHP HttpSockets send array of paramms -

node.js - Using Node without global install -