r - How to Detect and Mark Change within a Column in Another Column -
i'm trying mark when process starts , ends. code needs detect when change begins , when ends, marking in column.
example data:
date process 2007 0 2008 1 2009 1 2010 1 2011 1 2012 1 2013 0
goal:
date process status 2007 0 na 2008 1 process_start 2009 1 na 2010 1 na 2011 1 na 2012 1 process_end 2013 0 na
maybe calculating diff
, lagging in both directions:
dif <- diff(df1$process) df1$status <- factor(c(na, dif) - 2 * c(dif, na), levels = -3:3) levels(df1$status) <- c(rep(na, 4), "start", "end", "start&end") # date process status # 1 2007 0 <na> # 2 2008 1 start # 3 2009 1 <na> # 4 2010 1 <na> # 5 2011 1 <na> # 6 2012 1 end # 7 2013 0 <na>
update
version without factors:
dif <- diff(df1$process) df1$status <- c(na, dif) - 2 * c(dif, na) df1$status <- c(rep(na,4), "start", "end", "start&end")[df1$status + 4]
note in case of single year process have "start & end" situation.
update 2
if series starts (or ends) process = 1 expected output might not na start (or end):
dif <- diff(df1$process) df1$status <- c(df1$process[1], dif) - 2 * c(dif, -tail(df1$process,1)) df1$status <- c(rep(na,4), "start", "end", "start&end")[df1$status + 4]
more complicated example:
set.seed(4) df1 <- data.frame(date = 2007:(2007+24), process = sample(c(0,1, 1), 25, true))
the last version produces:
# date process status # 1 2007 1 start&end # 2 2008 0 <na> # 3 2009 0 <na> # 4 2010 0 <na> # 5 2011 1 start&end # 6 2012 0 <na> # 7 2013 1 start # 8 2014 1 <na> # 9 2015 1 end # 10 2016 0 <na> # 11 2017 1 start&end # 12 2018 0 <na> # 13 2019 0 <na> # 14 2020 1 start # 15 2021 1 <na> # 16 2022 1 <na> # 17 2023 1 <na> # 18 2024 1 <na> # 19 2025 1 <na> # 20 2026 1 <na> # 21 2027 1 <na> # 22 2028 1 <na> # 23 2029 1 <na> # 24 2030 1 <na> # 25 2031 1 end
Comments
Post a Comment