python - testing whether a Numpy array contains a given row -


is there pythonic , efficient way check whether numpy array contains @ least 1 instance of given row? "efficient" mean terminates upon finding first matching row rather iterating on entire array if result has been found.

with python arrays can accomplished cleanly if row in array:, not work expect numpy arrays, illustrated below.

with python arrays:

>>> = [[1,2],[10,20],[100,200]] >>> [1,2] in true >>> [1,20] in false 

but numpy arrays give different , rather odd-looking results. (the __contains__ method of ndarray seems undocumented.)

>>> = np.array([[1,2],[10,20],[100,200]]) >>> np.array([1,2]) in true >>> np.array([1,20]) in true >>> np.array([1,42]) in true >>> np.array([42,1]) in false 

numpys __contains__ is, @ time of writing this, (a == b).any() arguably correct if b scalar (it bit hairy, believe – works in 1.7. or later – right general method (a == b).all(np.arange(a.ndim - b.ndim, a.ndim)).any(), makes sense combinations of a , b dimensionality)...

edit: clear, not expected result when broadcasting involved. might argue should handle items in a separately np.in1d does. not sure there 1 clear way should work.

now want numpy stop when finds first occurrence. afaik not exist @ time. difficult because numpy based on ufuncs, same thing on whole array. numpy optimize these kind of reductions, works when array being reduced boolean array (i.e. np.ones(10, dtype=bool).any()).

otherwise need special function __contains__ not exist. may seem odd, have remember numpy supports many data types , has bigger machinery select correct ones , select correct function work on it. in other words, ufunc machinery cannot it, , implementing __contains__ or such specially not trivial because of data types.

you can of course write in python, or since know data type, writing in cython/c simple.


that said. better anyway use sorting based approach these things. little tedious there no such thing searchsorted lexsort, works (you abuse scipy.spatial.ckdtree if like). assumes want compare along last axis only:

# unfortunatly need use structured arrays: sorted = np.ascontiguousarray(a).view([('', a.dtype)] * a.shape[-1]).ravel()  # @ point, can use np.in1d, if have many b # better.  sorted.sort()  b_comp = np.ascontiguousarray(b).view(sorted.dtype) ind = sorted.searchsorted(b_comp)  result = sorted[ind] == b_comp 

this works array b, , if keep sorted array around, better if single value (row) in b @ time, when a stays same (otherwise np.in1d after viewing recarray). important: must np.ascontiguousarray safety. typically nothing, if does, big potential bug otherwise.


Comments

Popular posts from this blog

node.js - Using Node without global install -

How to access a php class file from PHPFox framework into javascript code written in simple HTML file? -

java - Null response to php query in android, even though php works properly -