python - Assigning result of pandas groupby -

July 15, 2011

i have following dataframe:

date,       industry, symbol, roc 25-02-2015, health,   abc,    200 25-02-2015, health,   xyz,    150 25-02-2015, mining,   tyr,    45 25-02-2015, mining,   ujk,    70 26-02-2015, health,   abc,    60 26-02-2015, health,   xyz,    310 26-02-2015, mining,   tyr,    65 26-02-2015, mining,   ujk,    23

i need determine average 'roc', max 'roc', min 'roc' how many symbols exist each date+industry. in other words need groupby date , industry, , determine various averages, max/min etc.

so far doing following, working seems slow , inefficient:

sector_df = primary_df.groupby(['date', 'industry'], sort=true).mean() tmp_max_df = primary_df.groupby(['date', 'industry'], sort=true).max() tmp_min_df = primary_df.groupby(['date', 'industry'], sort=true).min() tmp_count_df = primary_df.groupby(['date', 'industry'], sort=true).count() sector_df['max_roc'] = tmp_max_df['roc'] sector_df['min_roc'] = tmp_min_df['roc'] sector_df['count'] = tmp_count_df['roc'] sector_df.reset_index(inplace=true) sector_df.set_index(['date', 'industry'], inplace=true)

the above code works, resulting in dataframe indexed date+industry, showing me min/max 'roc' each date+industry, how many symbols existed each date+industry.

i doing complete groupby multiple times (to determine mean, max, min, count of 'roc'). slow because it's doing same thing on , over.

is there way group once. perform mean, max etc on object , assign result sector_df?

you want perform aggregate using agg:

in [72]:  df.groupby(['date','industry']).agg([pd.series.mean, pd.series.max, pd.series.min, pd.series.count]) out[72]:                        roc                                       mean  max  min count date       industry                        2015-02-25 health    175.0  200  150     2            mining     57.5   70   45     2 2015-02-26 health    185.0  310   60     2            mining     44.0   65   23     2

this allows pass iterable (a list in case) of functions perform.

edit

to access individual results need pass tuple each axis:

in [78]: gp.loc[('2015-02-25','health'),('roc','mean')]  out[78]: 175.0

where gp = df.groupby(['date','industry']).agg([pd.series.mean, pd.series.max, pd.series.min, pd.series.count])

Search This Blog

Call

python - Assigning result of pandas groupby -

Comments

Post a Comment

Popular posts from this blog

node.js - Using Node without global install -

php - CakePHP HttpSockets send array of paramms -

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -