Perform multiple survival analysis with loop in R -
i working on survival analysis r. have 2 data frames, genedf gene expression, survdf follow-up. following samples:
#data frame:geneid geneid=c("egfr","her2","e2f1","pten") patient1=c(12,23,56,23) patient2=c(23,34,11,6) patient3=c(56,44,32,45) patient4=c(23,64,45,23) genedf=data.frame(patient1,patient2,patient3,patient4,geneid) > genedf patient1 patient2 patient3 patient4 geneid 1 12 23 56 23 egfr 2 23 34 44 64 her2 3 56 11 32 45 e2f1 4 23 6 45 23 pten #data frame:survdf id=c("patient1","patient2","patient3","patient4") time=c(23,7,34,56) status=c(1,0,1,1) survdf=data.frame(id,time,status) # > survdf id time status 1 patient1 23 1 2 patient1 7 0 3 patient1 34 1 4 patient1 56 1
i extract expression data of specific gene genedf, , use median of expression cut off value perform survival analysis “survival”package, , gain p value survdiff. in following codes use "egfr" gene example.
#extract expression of gene targetgene<-subset(genedf,grepl("egfr",genedf$geneid)) targetgene$geneid<-null #transpose table , adjust format targetgene<-t(targetgene[,1:ncol(targetgene)]) targetgene<-data.frame(as.factor(rownames(targetgene)),targetgene) colnames(targetgene)<-c("id","expression") rownames(targetgene)<-null targetgene$expression1<-targetgene$expression targetgene$expression1[ targetgene$expression<median( targetgene$expression)]<-1 targetgene$expression1[ targetgene$expression>=median( targetgene$expression)]<-2 #survival analysis library(survival) ##add survival object survdf$survobj<-with(survdf, surv(time,status==1)) ## kaplan-meier estimator stage km<-survfit(survobj~targetgene$expression1, data=survdf, conf.type = "log-log") sdf<-survdiff(surv(time, status) ~targetgene$expression1, data=survdf) #gain p value p.val <-1-pchisq(sdf$chisq, length(sdf$n) - 1) > p.val [1] 0.1572992
i can through different genes 1 one. question is: there more 10,000 gene need analyzed. want gain p-values of them , put them new data frame. need use loop or apply?
this ugly scritp working.
in data10, in first column need have time, in second 1 status , in next treatments want.(patients rownames)
loopsurff<-function(data10){combos<- rbind.data.frame(rep(1,ncol(data10)- 2), rep(2,ncol(data10)-2),rep(3:(ncol(data10)-2),1)) combos<-as.matrix(sapply(combos, as.numeric));library(plyr); library(survival) vv<-adply(combos, 2, function(x) { fit <-survdiff(surv(data10[,1], data10[,2]) ~ data10[, x[3]],data=data10) p<-1 - pchisq(fit$chisq, 1) out <- data.frame("var1"=colnames(data10)[x[3]],"p.value" = as.numeric(sprintf("%.3f", p))) return(out) }) }`
you data frame column names of yourdata[,3:ncol(yourdata)] , p value each one.
Comments
Post a Comment