python - distinct contiguous blocks in pandas dataframe -

March 15, 2015

i have pandas dataframe looking this:

 x1=[np.nan, 'a','a','a', np.nan,np.nan,'b','b','c',np.nan,'b','b', np.nan]  ty1 = pd.dataframe({'name':x1})

do know how can list of tuples containing start , end indices of distinct contiguous blocks? example dataframe above,

[(1,3), (6,7), (8,8), (10,11)].

you can use shift , cumsum create 'id's each contiguous block:

in [5]: blocks = (ty1 != ty1.shift()).cumsum()  in [6]: blocks out[6]:     name 0      1 1      2 2      2 3      2 4      3 5      4 6      5 7      5 8      6 9      7 10     8 11     8 12     9

you interested in blocks not nan, filter that:

in [7]: blocks = blocks[ty1['name'].notnull()]  in [8]: blocks out[8]:     name 1      2 2      2 3      2 6      5 7      5 8      6 10     8 11     8

and then, can first , last index each 'id':

in [10]: blocks.groupby('name').apply(lambda x: (x.index[0], x.index[-1])) out[10]: name 2      (1, 3) 5      (6, 7) 6      (8, 8) 8    (10, 11) dtype: object

although, if last step necessary depend on want (working tuples elements in dataframes in not recommended). maybe having 'id's can enough.

Search This Blog

Call

python - distinct contiguous blocks in pandas dataframe -

Comments

Post a Comment

Popular posts from this blog

node.js - Using Node without global install -

php - CakePHP HttpSockets send array of paramms -

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -