for loop - Optimizing neighbor count function for conway's game of life in C -
having trouble optimizing function returns number of neighbors of cell in conway's game of life implementation. i'm trying learn c , better @ coding. i'm not @ recognizing potential optimizations, , i've spent lot of time online reading various methods it's not clicking me yet.
specifically i'm trying figure out how unroll nested loop in efficient way, each time try make runtime longer. i'm including function, don't think other context needed. advice can give!
here code countneighbors() function:
static int countneighbors(board b, int x, int y) { int n = 0; int x_left = max(0, x-1); int x_right = min(height, x+2); int y_left = max(0, y-1); int y_right = min(width, y+2); int xx, yy; (xx = x_left; xx < x_right; ++xx) { (yy = y_left; yy < y_right; ++yy) { n += b[xx][yy]; } } return n - b[x][y]; }
instead of declaring board b[width][height] declare b[width + 2][height + 2]. gives margin have zeros, prevents index out of bounds. so, instead of:
x x x x we have:
0 0 0 0 0 x x 0 0 x x 0 0 0 0 0 x denotes used cells, 0 unused.
typical trade off: bit of memory speed.
thanks don't have call min , max functions (which have bad performance if statements).
finally, write function that:
int countneighborsfast(board b, int x, int y) { int n = 0; n += b[x-1][y-1]; n += b[x][y-1]; n += b[x+1][y-1]; n += b[x-1][y]; n += b[x+1][y]; n += b[x-1][y+1]; n += b[x][y+1]; n += b[x+1][y+1]; return n; } benchmark (updated)
thanks jongware comment added linearization (reducing array's dimensions 2 1) , changing int char.
i made main loop linear , calculate returned sum directly, without intermediate n variable.
2d array 10002 x 10002, 1d had 100040004 elements.
the cpu have pentium dual-core t4500 @ 2.30 ghz, further details here (output of cat /prof/cpuinfo).
results on default optimization level o0:
original: 15.50s mine: 10.13s linear: 2.51s linearandchars: 2.48s linearandcharsandlinearloop: 2.32s linearandcharsandlinearloopandsum: 1.53s that's 10x faster compared original version.
results on o2:
original: 6.42s mine: 4.17s linear: 0.55s linearandchars: 0.53s linearandcharsandlinearloop: 0.42s linearandcharsandlinearloopandsum: 0.44s about 15x faster.
on o3:
original: 10.44s mine: 1.47s linear: 0.26s linearandchars: 0.26s linearandcharsandlinearloop: 0.25s linearandcharsandlinearloopandsum: 0.24s about 44x faster.
the last version, linearandcharsandlinearloopandsum is:
typedef char board3[(height + 2) * (width + 2)]; int i; (i = width + 3; <= (width + 2) * (height + 1) - 2; i++) countneighborslinearandcharsandlinearloopandsum(b3, i); int countneighborslinearandcharsandlinearloopandsum(board3 b, int pos) { return b[pos - 1 - (width + 2)] + b[pos - (width + 2)] + b[pos + 1 - (width + 2)] + b[pos - 1] + b[pos + 1] + b[pos - 1 + (width + 2)] + b[pos + (width + 2)] + b[pos + 1 + (width + 2)]; } changing 1 + (width + 2) width + 3 won't help, because compiler takes care of anyway (even on o0 optimization level).
Comments
Post a Comment