python - distinct contiguous blocks in pandas dataframe -
i have pandas dataframe looking this:
x1=[np.nan, 'a','a','a', np.nan,np.nan,'b','b','c',np.nan,'b','b', np.nan] ty1 = pd.dataframe({'name':x1})
do know how can list of tuples containing start , end indices of distinct contiguous blocks? example dataframe above,
[(1,3), (6,7), (8,8), (10,11)].
you can use shift
, cumsum
create 'id's each contiguous block:
in [5]: blocks = (ty1 != ty1.shift()).cumsum() in [6]: blocks out[6]: name 0 1 1 2 2 2 3 2 4 3 5 4 6 5 7 5 8 6 9 7 10 8 11 8 12 9
you interested in blocks not nan, filter that:
in [7]: blocks = blocks[ty1['name'].notnull()] in [8]: blocks out[8]: name 1 2 2 2 3 2 6 5 7 5 8 6 10 8 11 8
and then, can first , last index each 'id':
in [10]: blocks.groupby('name').apply(lambda x: (x.index[0], x.index[-1])) out[10]: name 2 (1, 3) 5 (6, 7) 6 (8, 8) 8 (10, 11) dtype: object
although, if last step necessary depend on want (working tuples elements in dataframes in not recommended). maybe having 'id's can enough.
Comments
Post a Comment