unicode - Tabulating characters with diacritics in R -


i'm trying tabulate phones (characters) occurrences in string, diacritics tabulated characters on own. ideally, have wordlist in international phonetic alphabet, fair amount of diacritics , several combinations of them base characters. give here mwe 1 word, same goes list of words , more types of combinations.

> word <- "n̥ana" # word constituted 4 phones: [n̥],[a],[n],[a] > table(strsplit(word, ""))  ̥ n  1 2 2 

but wanted result is:

a n n̥ 2 1 1 

how can manage kind of result?

try

library(stringi) table(stri_split_boundaries(word, type='character')) #a n n̥  #2 1 1  

or

 table(strsplit(word, '(?<=\\p{ll}|\\w)(?=\\w)', perl=true))  #a n  n̥   #2 1 1  

Comments

Popular posts from this blog

node.js - Using Node without global install -

How to access a php class file from PHPFox framework into javascript code written in simple HTML file? -

java - Null response to php query in android, even though php works properly -