r - Generate multiple permutations of vector with non-repeating elements -
i have vector:
seq1<-c('a','b','c','b','a','b','c','b','a','b','c')
i wish permute elements of vector create multiple (ideally 5000) vectors condition permuted vectors cannot have repeated elements within vector in consecutive elements. e.g. "abbca...." not allowed 'b-b' repeat.
i realize small example there not 5000 solutions. typically dealing larger vectors. willing consider sampling replacement, though i'm working on solutions without replacement.
i looking better solutions current thinking.
option 1. - brute force.
here, repeatedly sample , check if successive elements duplicates.
set.seed(18) seq1b <- sample(seq1a) seq1b #[1] "b" "b" "a" "a" "c" "b" "b" "c" "a" "c" "b" sum(seq1b[-length(seq1b)]==seq1b[-1]) #3
this not solution there 3 duplicated consecutive elements. realize lag
better way check duplicated elements reason being finicky (i think being masked package have loaded).
set.seed(1000) res<-null (i in 1:10000){res[[i]]<-sample(seq1a)} res1 <- lapply(res, function(x) sum(x[-length(x)]==x[-1])) sum(unlist(res1)==0) #228
this produces 228 options out of 10000 iterations. let's see how many unique ones:
res2 <- res[which(unlist(res1)==0)] unique(unlist(lapply(res2, paste0, collapse=""))) #134
out of 10000 attempts 134 unique ones short example vector.
here 3 of 134 example sequences produced:
# "bcbabcbabca" "cbabababcbc" "bcbcababacb"
in fact, if try on 500,000 samples, can 212 unique sequences match non-repeating criteria. close upper limit of possible ones.
option 2. - iteratively
a second idea had more iterative approach.
seq1a table(seq1a) #a b c #3 5 3
we sample 1 of these letters our starting point. sample remaining ones, check if same chosen 1 , if not, add end. , on , forth...
set.seed(10) newseq <- sample(seq1a,1) #b newseq #[1] "b" remaining <-seq1a[!seq1a %in% newseq | duplicated(seq1a)] table(remaining) #a b c #3 4 3 set.seed(10) newone <- sample(remaining,1) #c #check if newone same previous one. newone==newseq[length(newseq)] #false newseq <- c(newseq, newone) #update newseq newseq #[1] "b" "c" remaining <-seq1a[!seq1a %in% newseq | duplicated(seq1a)] #update remaining remaining table(remaining) #a b c #3 4 2
this might work, can see running lots of issues - e.g. go:
# "a" "c" "a" "c" "a" "b" ...
and left 3 more 'b's cannot go @ end they'd duplicates.
of course, lot easier if allowed sampling replacement, i'm trying without replacement.
you can use iterpc
package work combinations , iterations. hadn't heard of until trying answer question there might more effective ways use same package.
here i've used iterpc
set iterator, , getall
find combinations of vector based on iterator. seems report unique combinations, making bit nicer finding combinations expand.grid
.
#install.packages("iterpc") require("iterpc") seq1 <- c('a','b','c','b','a','b','c','b','a','b','c') <- iterpc(n = table(seq1), ordered=true) all_seqs <- getall(i) # result matrix permutations rows: head(all_seqs) # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] #[1,] "a" "a" "a" "b" "b" "b" "b" "b" "c" "c" "c" #[2,] "a" "a" "a" "b" "b" "b" "b" "c" "b" "c" "c" #[3,] "a" "a" "a" "b" "b" "b" "b" "c" "c" "b" "c" #[4,] "a" "a" "a" "b" "b" "b" "b" "c" "c" "c" "b" #[5,] "a" "a" "a" "b" "b" "b" "c" "b" "b" "c" "c" #[6,] "a" "a" "a" "b" "b" "b" "c" "b" "c" "b" "c"
the rle
function tells consecutive values equal each other in vector. lengths
component of output tells how many times each element of values
repeated:
rle(c("a", "a", "b", "b", "b", "c", "b")) # run length encoding # lengths: int [1:3] 2 3 1 1 # values : chr [1:3] "a" "b" "c" "b"
the length of values
or lengths
equal length of original vector combinations have no consecutive repeats.
you can therefore apply rle
each row, calculate length of values
or lengths
, keep rows all_seqs
calculated value same length of seqs1
.
#apply rle function all_seqs_rle <- apply(getall(i), 1, function(x) length(rle(x)$values)) # keep rows have rle length equal length(seq1) all_seqs_good <- all_seqs[which(all_seqs_rle == length(seq1)), ]
all_seqs_good
has nrow
of 212, suggesting did indeed find possible combinations example vector.
nrow(all_seqs_good) # 212
technically still brute forcing (except doesn't calculate every possible combination - unique ones), quick example. i'm not sure how cope larger vectors yet...
edit: seem fail larger vectors. 1 solution break larger vectors smaller chunks, process chunks above , combine them - keeping combinations meet criteria.
for example, breaking vector of length 24 2 vectors of length 12, combining results can give 200,000+ combinations meet critera , pretty quick (around 1 minute me):
# function based on above solution seq_check <- function(myseq){ = iterpc(n = table(myseq), ordered=true) all_seqs <- getall(i) all_seqs_rle <- apply(getall(i), 1, function(x) length(rle(x)$values)) all_seqs_good <- all_seqs[which(all_seqs_rle == length(myseq)), ] return(all_seqs_good) } set.seed(1) seq1<-sample(c(rep("a", 8), rep("b", 8), rep("c", 8)),24) seq1a <- seq1[1:12] seq1b <- seq1[13:24] #get permutations no consecutive repeats seq1a = apply(seq_check(seq1a), 1, paste0, collapse="") seq1b = apply(seq_check(seq1b), 1, paste0, collapse="") #combine seq1a , seq1b: combined_seqs <- expand.grid(seq1a, seq1b) combined_seqs <- apply(combined_seqs, 1, paste0, collapse="") #function calculate rle lengths rle_calc <- function(x) length(rle(unlist(strsplit(x, "")))$values) #keep combined sequences have rle lengths of 24 combined_seqs_rle <- sapply(combined_seqs, rle_calc) passed_combinations <- combined_seqs[which(combined_seqs_rle == 24)] #find number of solutions length(passed_combinations) #[1] 245832 length(unique(passed_combinations)) #[1] 245832
you might need re-order starting vector best results. example, if seq1
in above example had started "a" 8 times in row, there no passing solutions. example, try splitting solution seq1 <- c(rep("a", 8), rep("b", 8), rep("c", 8))
, no solutions back, though there same number of solutions random sequence.
it doesn't need find every possible passing combination, if larger vectors you'll need iterate through i
using getnext
function iterpc
, , check each 1 in loop slow.
Comments
Post a Comment