r - Generate multiple permutations of vector with non-repeating elements -


i have vector:

seq1<-c('a','b','c','b','a','b','c','b','a','b','c') 

i wish permute elements of vector create multiple (ideally 5000) vectors condition permuted vectors cannot have repeated elements within vector in consecutive elements. e.g. "abbca...." not allowed 'b-b' repeat.

i realize small example there not 5000 solutions. typically dealing larger vectors. willing consider sampling replacement, though i'm working on solutions without replacement.

i looking better solutions current thinking.

option 1. - brute force.

here, repeatedly sample , check if successive elements duplicates.

set.seed(18) seq1b <-  sample(seq1a) seq1b #[1] "b" "b" "a" "a" "c" "b" "b" "c" "a" "c" "b" sum(seq1b[-length(seq1b)]==seq1b[-1])  #3 

this not solution there 3 duplicated consecutive elements. realize lag better way check duplicated elements reason being finicky (i think being masked package have loaded).

set.seed(1000) res<-null (i in 1:10000){res[[i]]<-sample(seq1a)} res1 <- lapply(res, function(x) sum(x[-length(x)]==x[-1])) sum(unlist(res1)==0) #228 

this produces 228 options out of 10000 iterations. let's see how many unique ones:

res2 <- res[which(unlist(res1)==0)] unique(unlist(lapply(res2, paste0, collapse="")))  #134 

out of 10000 attempts 134 unique ones short example vector.

here 3 of 134 example sequences produced:

# "bcbabcbabca" "cbabababcbc" "bcbcababacb" 

in fact, if try on 500,000 samples, can 212 unique sequences match non-repeating criteria. close upper limit of possible ones.

option 2. - iteratively

a second idea had more iterative approach.

seq1a table(seq1a) #a b c  #3 5 3 

we sample 1 of these letters our starting point. sample remaining ones, check if same chosen 1 , if not, add end. , on , forth...

set.seed(10) newseq <- sample(seq1a,1)  #b newseq #[1] "b"  remaining <-seq1a[!seq1a %in% newseq | duplicated(seq1a)] table(remaining) #a b c  #3 4 3   set.seed(10) newone <- sample(remaining,1) #c  #check if newone same previous one. newone==newseq[length(newseq)] #false newseq <- c(newseq, newone) #update newseq newseq #[1] "b" "c"  remaining <-seq1a[!seq1a %in% newseq | duplicated(seq1a)] #update remaining remaining table(remaining)  #a b c  #3 4 2  

this might work, can see running lots of issues - e.g. go:

# "a" "c" "a" "c" "a" "b"  ... 

and left 3 more 'b's cannot go @ end they'd duplicates.

of course, lot easier if allowed sampling replacement, i'm trying without replacement.

you can use iterpc package work combinations , iterations. hadn't heard of until trying answer question there might more effective ways use same package.

here i've used iterpc set iterator, , getall find combinations of vector based on iterator. seems report unique combinations, making bit nicer finding combinations expand.grid.

#install.packages("iterpc") require("iterpc")  seq1 <- c('a','b','c','b','a','b','c','b','a','b','c')  <- iterpc(n = table(seq1), ordered=true)  all_seqs <- getall(i)  # result matrix permutations rows: head(all_seqs) #     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] #[1,] "a"  "a"  "a"  "b"  "b"  "b"  "b"  "b"  "c"  "c"   "c"   #[2,] "a"  "a"  "a"  "b"  "b"  "b"  "b"  "c"  "b"  "c"   "c"   #[3,] "a"  "a"  "a"  "b"  "b"  "b"  "b"  "c"  "c"  "b"   "c"   #[4,] "a"  "a"  "a"  "b"  "b"  "b"  "b"  "c"  "c"  "c"   "b"   #[5,] "a"  "a"  "a"  "b"  "b"  "b"  "c"  "b"  "b"  "c"   "c"   #[6,] "a"  "a"  "a"  "b"  "b"  "b"  "c"  "b"  "c"  "b"   "c"  

the rle function tells consecutive values equal each other in vector. lengths component of output tells how many times each element of values repeated:

rle(c("a", "a", "b", "b", "b", "c", "b"))  # run length encoding #   lengths: int [1:3] 2 3 1 1 #   values : chr [1:3] "a" "b" "c" "b" 

the length of values or lengths equal length of original vector combinations have no consecutive repeats.

you can therefore apply rle each row, calculate length of values or lengths , keep rows all_seqs calculated value same length of seqs1.

#apply rle function  all_seqs_rle <- apply(getall(i), 1, function(x) length(rle(x)$values))  # keep rows have rle length equal length(seq1) all_seqs_good <- all_seqs[which(all_seqs_rle == length(seq1)), ] 

all_seqs_good has nrow of 212, suggesting did indeed find possible combinations example vector.

nrow(all_seqs_good) # 212  

technically still brute forcing (except doesn't calculate every possible combination - unique ones), quick example. i'm not sure how cope larger vectors yet...

edit: seem fail larger vectors. 1 solution break larger vectors smaller chunks, process chunks above , combine them - keeping combinations meet criteria.

for example, breaking vector of length 24 2 vectors of length 12, combining results can give 200,000+ combinations meet critera , pretty quick (around 1 minute me):

# function based on above solution seq_check <- function(myseq){ = iterpc(n = table(myseq), ordered=true) all_seqs <- getall(i) all_seqs_rle <- apply(getall(i), 1, function(x) length(rle(x)$values)) all_seqs_good <- all_seqs[which(all_seqs_rle == length(myseq)), ] return(all_seqs_good) }  set.seed(1) seq1<-sample(c(rep("a", 8), rep("b", 8), rep("c", 8)),24)  seq1a <- seq1[1:12] seq1b <- seq1[13:24]  #get permutations no consecutive repeats seq1a = apply(seq_check(seq1a), 1, paste0, collapse="") seq1b = apply(seq_check(seq1b), 1, paste0, collapse="")  #combine seq1a , seq1b:  combined_seqs <- expand.grid(seq1a, seq1b) combined_seqs <- apply(combined_seqs, 1, paste0, collapse="")   #function calculate rle lengths rle_calc <- function(x) length(rle(unlist(strsplit(x, "")))$values)  #keep combined sequences have rle lengths of 24 combined_seqs_rle <- sapply(combined_seqs, rle_calc) passed_combinations <- combined_seqs[which(combined_seqs_rle == 24)]  #find number of solutions length(passed_combinations) #[1] 245832 length(unique(passed_combinations)) #[1] 245832 

you might need re-order starting vector best results. example, if seq1 in above example had started "a" 8 times in row, there no passing solutions. example, try splitting solution seq1 <- c(rep("a", 8), rep("b", 8), rep("c", 8)) , no solutions back, though there same number of solutions random sequence.

it doesn't need find every possible passing combination, if larger vectors you'll need iterate through i using getnext function iterpc, , check each 1 in loop slow.


Comments

Popular posts from this blog

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -

php - CakePHP HttpSockets send array of paramms -

node.js - Using Node without global install -