regex - How can I split a street address in the below formats in unix or R or grep or awk? -


i have file italian street name , address, have split column of address street name , street number. catch addresses have 2 or 3 string , numbers or number times have character e.g 15/a of them have adress 12-maggio 23 , split should first column 12-maggio , second column 23.

below format of file

street.adress falcone n. 1 fortunato giustino 2 pisacane 3 fabrizio de andre' 8 s. satta 7 agnesi 16 volturno cigni 80 montepenice 6 cucchiari 15 molinetto di lorenteggio 15/t 7 don minzoni 15 senigallia 4 milano 38/a l. da vinci 13/a 27-novembre 9 

output should in 2 separate columns

falcone n.  1  fortunato giustino 2  pisacane   3  fabrizio de andre' 8  s. satta   7  agnesi 16  volturno cigni 80  montepenice 6  6  cucchiari  15  molinetto di lorenteggio    15/t 7  don minzoni    15  senigallia 4  milano 38/a  l. da vinci    13/a  27-novembre    9 

how can achieve this, have tried excel formulas , unsplit not work. have tried in r below code fails, how can this?

for (i in 1:nrow (df)) {    new_df [i,"street.name"] <- unlist(strsplit (df[["street.addresses"]], " ")[i])[1]   new_df [i,"street.number"] <- paste (unlist(strsplit (df[["street.addresses"]], " ")[i])[-1], collapse = " ")  } 

tried

df <- gsub("$([0-9]+ +)?(.*)", "\\1\t\\2", df) 

nothing works. leads

this regular expression combined gsub() , strsplit() works on data provided.

the trick here first insert \t @ location want split string, use strsplit() \t separator.

x <- read.table(sep = "\n",                 header = true,                 quote = "\"",                 text = "street.adress falcone n. 1 fortunato giustino 2 pisacane 3 fabrizio de andre' 8 s. satta 7 agnesi 16 volturno cigni 80 montepenice 6 cucchiari 15 molinetto di lorenteggio 15/t 7 don minzoni 15 senigallia 4 milano 38/a l. da vinci 13/a 27-novembre 9" )   pattern <- "(.*?) +(\\d+.*)"  z <- gsub(pattern, "\\1\t\\2", x[[1]]) unlist(   strsplit(z, "\t") ) 

the results:

 [1] "falcone n."               "1"                         [3] "fortunato giustino"       "2"                         [5] "pisacane"                 "3"                         [7] "fabrizio de andre'"       "8"                         [9] "s. satta"                 "7"                        [11] "agnesi"                   "16"                       [13] "volturno cigni"           "80"                       [15] "montepenice"              "6"                        [17] "cucchiari"                "15"                       [19] "molinetto di lorenteggio" "15/t 7"                   [21] "don minzoni"              "15"                       [23] "senigallia"               "4"                        [25] "milano"                   "38/a"                     [27] "l. da vinci"              "13/a"                     [29] "27-novembre"              "9"    

ps. answer edited deal fact there quote ' in input data. deal this, have set quote = "\"" argument read.table() otherwise lines skipped.


Comments

Popular posts from this blog

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -

php - CakePHP HttpSockets send array of paramms -

node.js - Using Node without global install -