regex - Use lapply on a subset of list elements and return list of same length as original in R -
i want apply regex operation subset of list elements (which character strings) using lapply , return list of same length original. list elements long strings (derived reading in long text files , collapsing paragraphs single string). regex operation valid subset of list elements/strings. want non-subsetted list elements (character strings) returned in original state.
the regex operation str_extract
stringr
package, i.e. want extract substring longer string. subset list elements based on regex pattern in filename.
an example simplified data:
library(stringr) texts <- as.list(c("abcdefghijkl", "mnopqrstuvwxyz", "ghijklmnopqrs", "uvwxyzabcdef")) filenames <- c("ab1997r.txt", "bg2000s.txt", "mn1999r.txt", "dc1997s.txt") names(texts) <- filenames regexp <- "abcdef"
i know in advance strings want apply regex operation, , hence want subset these strings. is, don't want run regex on elements in list, doing return invalid results (which not apparent in simplified example).
i've made few naive efforts, e.g.:
x <- lapply(texts[str_detect(names(texts), "1997")], str_extract, regexp) > x $ab1997r.txt [1] "abcdef" $dc1997s.txt [1] "abcdef"
which returns reduced-length list containing substrings found. results want are:
> x $ab1997r.txt [1] "abcdef" $bg2000s.txt [1] "mnopqrstuvwxyz" $mn1999r.txt [1] "ghijklmnopqrs" $dc1997s.txt [1] "abcdef"
where strings not containing regex pattern returned in original state.
i have informed myself stringr
, lapply
, llply
(in plyr
package), many operations illustrated using dataframes examples, not lists, , don't involve regex operations on character strings. can achieve goal using loop, i'm trying away that, advised, , better @ using apply-class of functions.
you can use subset operator [<-
:
x <- texts is1997 <- str_detect(names(texts), "1997") x[is1997] <- lapply(texts[is1997], str_extract, regexp) x # $ab1997r.txt # [1] "abcdef" # # $bg2000s.txt # [1] "mnopqrstuvwxyz" # # $mn1999r.txt # [1] "ghijklmnopqrs" # # $dc1997s.txt # [1] "abcdef" #
Comments
Post a Comment