R trying to create start and stop times from single column -
i r beginner , have got stuck seemingly simple problem. have large data frame 4 columns; id, date of observation, value (alb) , end date. single id may have between 1 , 15 or observations @ different dates. end date time of event or censoring 1 per id.
id date alb end 1143 2010-03-23 41 2010-12-15 1143 2010-06-29 39 2010-12-15 1144 2008-01-01 34 2009-08-06 1145 2010-03-23 42 2012-10-25 1145 2011-01-12 45 2012-10-25
for survival analysis using alb time varying covariate trying create episode each observation start , stop time column. trying create column stop time start time next alb observation or end time if there no further alb observation id. so:
id date alb end start stop 1143 2010-03-23 41 2010-12-15 2010-03-23 2010-06-29 1143 2010-06-29 39 2010-12-15 2010-06-29 2010-12-15 1144 2008-01-01 34 2009-08-06 2008-01-01 2009-08-06 1145 2010-03-23 42 2012-10-25 2010-03-23 2011-01-12 1145 2011-01-12 45 2012-10-25 2011-01-12 2012-10-25
i getting stuck creating column of stop times. got in mess trying make function nested if else statements. have simple approach? in advance!
in reply r2evans, large portion of data.frame of values dplyr action return 1970-01-01. (the full data frame 130,000 rows). thanks
id date alb end 1143 2010-03-23 41.0 1996-08-10 1143 2010-06-29 39.0 1996-08-10 1143 2011-01-12 42.0 1996-08-10 1143 2010-09-28 47.0 1996-08-10 1143 2011-07-19 40.0 1996-08-10 1143 2012-06-12 41.0 1996-08-10 1143 2013-06-25 40.0 1996-08-10 1143 2013-12-26 40.0 1996-08-10 1143 2014-06-15 40.0 1996-08-10 1143 2014-12-26 39.9 1996-08-10 1144 2008-01-01 34.0 2015-04-28 1145 2010-03-23 42.0 2015-04-28 1145 2012-01-13 44.0 2015-04-28 1145 2012-06-15 41.0 2015-04-28
your data:
data.frame( id=c( 1143, 1143, 1144, 1145, 1145 ), date=c("2010-03-23", "2010-06-29", "2008-01-01", "2010-03-23", "2011-01-12" ), alb=c( 41, 39, 34, 42, 45 ), end=c("2010-12-15", "2010-12-15", "2009-08-06", "2012-10-25", "2012-10-25" ) )
one technique use dplyr
:
library(dplyr) df %>% group_by(id) %>% mutate(start=date, stop=lead(start, default=end[1])) ## source: local data frame [5 x 6] ## groups: id ## ## id date alb end start stop ## 1 1143 2010-03-23 41 2010-12-15 2010-03-23 2010-06-29 ## 2 1143 2010-06-29 39 2010-12-15 2010-06-29 2010-12-15 ## 3 1144 2008-01-01 34 2009-08-06 2008-01-01 2009-08-06 ## 4 1145 2010-03-23 42 2012-10-25 2010-03-23 2011-01-12 ## 5 1145 2011-01-12 45 2012-10-25 2011-01-12 2012-10-25
if you'd rather base
functions only:
do.call('rbind', by(df, df$id, function(x) { cbind(x, start=x$date, stop=lead(x$date, default=x$end[1])) })) ## id date alb end start stop ## 1143.1 1143 2010-03-23 41 2010-12-15 2010-03-23 2010-06-29 ## 1143.2 1143 2010-06-29 39 2010-12-15 2010-06-29 2010-12-15 ## 1144 1144 2008-01-01 34 2009-08-06 2008-01-01 2009-08-06 ## 1145.4 1145 2010-03-23 42 2012-10-25 2010-03-23 2011-01-12 ## 1145.5 1145 2011-01-12 45 2012-10-25 2011-01-12 2012-10-25
Comments
Post a Comment