R trying to create start and stop times from single column -


i r beginner , have got stuck seemingly simple problem. have large data frame 4 columns; id, date of observation, value (alb) , end date. single id may have between 1 , 15 or observations @ different dates. end date time of event or censoring 1 per id.

    id        date  alb         end   1143  2010-03-23   41  2010-12-15   1143  2010-06-29   39  2010-12-15   1144  2008-01-01   34  2009-08-06    1145  2010-03-23   42  2012-10-25    1145  2011-01-12   45  2012-10-25 

for survival analysis using alb time varying covariate trying create episode each observation start , stop time column. trying create column stop time start time next alb observation or end time if there no further alb observation id. so:

    id        date  alb         end       start        stop   1143  2010-03-23   41  2010-12-15  2010-03-23  2010-06-29   1143  2010-06-29   39  2010-12-15  2010-06-29  2010-12-15   1144  2008-01-01   34  2009-08-06  2008-01-01  2009-08-06   1145  2010-03-23   42  2012-10-25  2010-03-23  2011-01-12   1145  2011-01-12   45  2012-10-25  2011-01-12  2012-10-25 

i getting stuck creating column of stop times. got in mess trying make function nested if else statements. have simple approach? in advance!

in reply r2evans, large portion of data.frame of values dplyr action return 1970-01-01. (the full data frame 130,000 rows). thanks

  id       date  alb        end 1143 2010-03-23 41.0 1996-08-10 1143 2010-06-29 39.0 1996-08-10 1143 2011-01-12 42.0 1996-08-10 1143 2010-09-28 47.0 1996-08-10 1143 2011-07-19 40.0 1996-08-10 1143 2012-06-12 41.0 1996-08-10 1143 2013-06-25 40.0 1996-08-10 1143 2013-12-26 40.0 1996-08-10 1143 2014-06-15 40.0 1996-08-10 1143 2014-12-26 39.9 1996-08-10 1144 2008-01-01 34.0 2015-04-28 1145 2010-03-23 42.0 2015-04-28 1145 2012-01-13 44.0 2015-04-28 1145 2012-06-15 41.0 2015-04-28 

your data:

data.frame(     id=c(        1143,         1143,         1144,         1145,         1145 ),   date=c("2010-03-23", "2010-06-29", "2008-01-01", "2010-03-23", "2011-01-12" ),    alb=c(          41,           39,           34,           42,           45 ),    end=c("2010-12-15", "2010-12-15", "2009-08-06", "2012-10-25", "2012-10-25" ) ) 

one technique use dplyr:

library(dplyr) df %>%     group_by(id) %>%     mutate(start=date, stop=lead(start, default=end[1])) ## source: local data frame [5 x 6] ## groups: id ##  ##     id       date alb        end      start       stop ## 1 1143 2010-03-23  41 2010-12-15 2010-03-23 2010-06-29 ## 2 1143 2010-06-29  39 2010-12-15 2010-06-29 2010-12-15 ## 3 1144 2008-01-01  34 2009-08-06 2008-01-01 2009-08-06 ## 4 1145 2010-03-23  42 2012-10-25 2010-03-23 2011-01-12 ## 5 1145 2011-01-12  45 2012-10-25 2011-01-12 2012-10-25 

if you'd rather base functions only:

do.call('rbind', by(df, df$id, function(x) {     cbind(x, start=x$date, stop=lead(x$date, default=x$end[1])) })) ##          id       date alb        end      start       stop ## 1143.1 1143 2010-03-23  41 2010-12-15 2010-03-23 2010-06-29 ## 1143.2 1143 2010-06-29  39 2010-12-15 2010-06-29 2010-12-15 ## 1144   1144 2008-01-01  34 2009-08-06 2008-01-01 2009-08-06 ## 1145.4 1145 2010-03-23  42 2012-10-25 2010-03-23 2011-01-12 ## 1145.5 1145 2011-01-12  45 2012-10-25 2011-01-12 2012-10-25 

Comments

Popular posts from this blog

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -

node.js - Using Node without global install -

php - CakePHP HttpSockets send array of paramms -