algorithm - how to make code on openMP work on xeon phi? -
everyone. hope me. have code parallel prim's algorithm on openmp , need make work on xeon phi. please, me. cannot understand how this. here code on openmp.
void parallelprim(double *pmatrix, ttreenode** pminspanningtree, int size) { int lastadded; tgraphnode nearestnode; tgraphnode **notinminspanningtree = new tgraphnode* [size-1]; lastadded = 0; for(int = 0; < size-1; i++) { notinminspanningtree[i] = new tgraphnode; notinminspanningtree[i]->nodenum = i+1; notinminspanningtree[i]->distance = -1.0f; notinminspanningtree[i]->parentnodenum = -1; } for(int iter = 1; iter < size; iter++) { #pragma omp parallel for(int = 0; < size-1; i++) if(notinminspanningtree[i] != null) { double t1 = notinminspanningtree[i]->distance; double t2 = pmatrix[(notinminspanningtree[i]->nodenum) * size + lastadded]; if(((t1 < 0) && (t2 > 0)) || (t1>0) && (t2 > 0) && (t1 > t2)) { notinminspanningtree[i]->distance = t2; notinminspanningtree[i]->parentnodenum = lastadded; } } nearestnode.nodenum = -1; nearestnode.distance = 3000; #pragma omp parallel { tgraphnode threadnearestnode; threadnearestnode.nodenum = -1; threadnearestnode.distance = 3000; #pragma omp for(int = 0; < size-1; i++) { if(notinminspanningtree[i] != null) { double t1 = notinminspanningtree[i]->distance; double t2 = threadnearestnode.distance; if((t1 > 0) && (t1 < t2) ) { threadnearestnode.distance = t1; threadnearestnode.nodenum = notinminspanningtree[i]->nodenum; } } } #pragma omp critical { if(threadnearestnode.distance < nearestnode.distance) { nearestnode.distance = threadnearestnode.distance; nearestnode.nodenum = threadnearestnode.nodenum; } } } pminspanningtree[nearestnode.nodenum] = new ttreenode; pminspanningtree[nearestnode.nodenum]->nodenum = notinminspanningtree[nearestnode.nodenum-1]->parentnodenum; pminspanningtree[nearestnode.nodenum]->distance = nearestnode.distance; int parent = notinminspanningtree[nearestnode.nodenum-1]->parentnodenum; if(pminspanningtree[parent] != null) { ttreenode *tmp = new ttreenode; tmp->distance = nearestnode.distance; tmp->nodenum = nearestnode.nodenum; } else { pminspanningtree[parent] = new ttreenode; pminspanningtree[parent]->distance = nearestnode.distance; pminspanningtree[parent]->nodenum = nearestnode.nodenum; } lastadded = nearestnode.nodenum; delete notinminspanningtree[nearestnode.nodenum - 1]; notinminspanningtree[nearestnode.nodenum - 1] = null; } delete[] notinminspanningtree; }
you have 2 basic options running code on intel xeon phi coprocessor. can compile entire program -mmic , -qopenmp flags , run using either micnativeloadex or copying executable , required libraries on coprocessor scp. alternately, can omit -mmic , instead modify code portion of code want run on coprocessor falls inside offload section portion of code sent coprocessor run , remainder of code run on host.
the presentation avi sent lovely overview of programming coprocessor. in addition can find basic information on compiling , optimizing coprocessor at: https://software.intel.com/en-us/articles/programming-and-compiling-for-intel-many-integrated-core-architecture.
however, , big however, code not vectorize , has significant serial sections. obtain optimum performance on coprocessor, code must both vectorize , parallelize.
Comments
Post a Comment