Perform multiple survival analysis with loop in R -


i working on survival analysis r. have 2 data frames, genedf gene expression, survdf follow-up. following samples:

#data frame:geneid   geneid=c("egfr","her2","e2f1","pten") patient1=c(12,23,56,23) patient2=c(23,34,11,6) patient3=c(56,44,32,45) patient4=c(23,64,45,23) genedf=data.frame(patient1,patient2,patient3,patient4,geneid) > genedf   patient1 patient2 patient3 patient4 geneid 1       12       23       56       23   egfr 2       23       34       44       64   her2 3       56       11       32       45   e2f1 4       23        6       45       23   pten #data frame:survdf id=c("patient1","patient2","patient3","patient4") time=c(23,7,34,56) status=c(1,0,1,1) survdf=data.frame(id,time,status) #     > survdf         id time status 1 patient1   23      1 2 patient1    7      0 3 patient1   34      1 4 patient1   56      1 

i extract expression data of specific gene genedf, , use median of expression cut off value perform survival analysis “survival”package, , gain p value survdiff. in following codes use "egfr" gene example.

#extract expression of gene targetgene<-subset(genedf,grepl("egfr",genedf$geneid)) targetgene$geneid<-null #transpose table , adjust format targetgene<-t(targetgene[,1:ncol(targetgene)]) targetgene<-data.frame(as.factor(rownames(targetgene)),targetgene) colnames(targetgene)<-c("id","expression") rownames(targetgene)<-null targetgene$expression1<-targetgene$expression  targetgene$expression1[ targetgene$expression<median( targetgene$expression)]<-1 targetgene$expression1[ targetgene$expression>=median( targetgene$expression)]<-2 #survival analysis library(survival) ##add survival object survdf$survobj<-with(survdf, surv(time,status==1)) ## kaplan-meier estimator stage km<-survfit(survobj~targetgene$expression1, data=survdf, conf.type = "log-log") sdf<-survdiff(surv(time, status) ~targetgene$expression1, data=survdf) #gain p value p.val <-1-pchisq(sdf$chisq, length(sdf$n) - 1) > p.val [1] 0.1572992 

i can through different genes 1 one. question is: there more 10,000 gene need analyzed. want gain p-values of them , put them new data frame. need use loop or apply?

this ugly scritp working.

in data10, in first column need have time, in second 1 status , in next treatments want.(patients rownames)

loopsurff<-function(data10){combos<- rbind.data.frame(rep(1,ncol(data10)- 2), rep(2,ncol(data10)-2),rep(3:(ncol(data10)-2),1)) combos<-as.matrix(sapply(combos, as.numeric));library(plyr); library(survival)  vv<-adply(combos, 2, function(x) { fit <-survdiff(surv(data10[,1], data10[,2]) ~ data10[, x[3]],data=data10) p<-1 - pchisq(fit$chisq, 1) out <- data.frame("var1"=colnames(data10)[x[3]],"p.value" =    as.numeric(sprintf("%.3f", p))) return(out)   }) }` 

you data frame column names of yourdata[,3:ncol(yourdata)] , p value each one.


Comments

Popular posts from this blog

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -

php - CakePHP HttpSockets send array of paramms -

node.js - Using Node without global install -