auto vectorization - cython boundscheck=True faster than boundscheck=False -
consider following minimal example:
#cython: language_level=3, boundscheck=false, wraparound=false, initializedcheck=false, cdivision=true cimport cython libc.stdlib cimport malloc def main(size_t ni, size_t nt, size_t nx): cdef: size_t i, j, t, x, y double[:, :, ::1] = <double[:ni, :ni, :nx]>malloc(ni * ni * nx * sizeof(double)) double[:, :, ::1] b = <double[:nt, :ni, :nx]>malloc(nt * ni * nx * sizeof(double)) size_t[:, :, ::1] best = <size_t[:nt, :ni, :nx]>malloc(nt * ni * nx * sizeof(size_t)) size_t mxi double s, mxs t in range(nt): j in range(ni): y in range(nx): # loops nothing needed effect below. mxs = -1e300 in range(ni): x in range(nx): cython.boundscheck(false): # faster!?!? s = b[t, i, x] + a[i, j, x] if s >= mxs: mxs = s mxi = best[t + 1, j, y] = mxi return best[0, 0, 0]
essentially summing 2 2d arrays along specific axes , finding maximizing index along axis.
when compiled gcc -o3 , called arguments (1, 2000, 2000), adding boundscheck=true results in twice faster execution when boundscheck=false.
any hint of why case? (well, can guess has again gcc autovectorization...)
thanks in advance.
(cross-posted cython-users)
boundscheck security check accessing indices inside bounds of vectors. if don't bother check if indices can go out of bounds faster. takes time perform check.
that is, if boundcheck true, check see if index inside range of vector before reading or writing memory. , if not throw error. if boundcheck false, read or write pointer if index out of bounds, given out false data reading , writing memory corrupting data writing.
from documentation:
the array lookups still slowed down 2 factors:
1) bounds checking performed.
2) negative indices checked , handled correctly.
the consequences of not bound checking being:
now bounds checking not performed (and, side-effect, if ‘’do’’ happen access out of bounds in best case crash program , in worst case corrupt data).
where particularly important can have none vectors. here warning documentation:
warning
speed comes cost. can dangerous set typed objects (like f, g , h in our sample code) none. setting such objects none entirely legal, can them check whether none. other use (attribute lookup or indexing) can potentially segfault or corrupt data (rather raising exceptions in python).
the actual rules bit more complicated main message clear: not use typed objects without knowing not set none.
Comments
Post a Comment