unicode - Tabulating characters with diacritics in R -
i'm trying tabulate phones (characters) occurrences in string, diacritics tabulated characters on own. ideally, have wordlist in international phonetic alphabet, fair amount of diacritics , several combinations of them base characters. give here mwe 1 word, same goes list of words , more types of combinations.
> word <- "n̥ana" # word constituted 4 phones: [n̥],[a],[n],[a] > table(strsplit(word, "")) ̥ n 1 2 2 but wanted result is:
a n n̥ 2 1 1 how can manage kind of result?
try
library(stringi) table(stri_split_boundaries(word, type='character')) #a n n̥ #2 1 1 or
table(strsplit(word, '(?<=\\p{ll}|\\w)(?=\\w)', perl=true)) #a n n̥ #2 1 1
Comments
Post a Comment