python - Regex uniform pattern requirements -


i've been trying parse parcel numbers regex , have run issues. started this:

r'(?<=[":#][\s\n])(\d{2}[-:\s]*\d{2}[-:\s]*\d{3}[-:\s]*\d{3}(?:\-{1}\d{4})?)' 

with behind make sure didn't return phone number or internal file number of 10 or 14 digit length accident. turned out 1 listing might contain several parcel number (up 40+) separated number of chars (whitespace, and, /, &, etc.). chopped off behind deal so:

r'\d{2}[-:\s]*\d{2}[-:\s]*\d{3}[-:\s]*\d{3}(?:[-:\s]*\d{4}$)?' 

but on example containing:

# 22-33-155-003 nka 22-33-155-009 ...... h/w # 41877 1021690 upaxlp

which returned:

['22-33-155-009', '22-33-155-003', '1877 102169'] 

i have tried adding ^ beginning , $ end prevent last bit (41877 1021690 upaxlp) returning '1877 102169', returns nothing.

each listing different source has different formats showing parcel numbers, sure fire way identify 10 digit patterns possible characters (-,/, space, etc.) separating , using behind/ahead ensure in fact parcel number.

my questions are:

1) how can maintain ahead/behind while accounting possibility of several parcels separated several possible chars?

2) how can enforce if separating character used, used entire way through? 12-34-567-890 or 12 34 567 890 , not 1234 567890 or 12-34:567 890, prevent last example shown above.

3) there better way of doing this?

you can enforce identical separation characters using lookbehind:

r""" \d{2}(?p<separator>[-:\s]?) \d{2}(?p=separator) \d{3}(?p=separator) \d{3}(?:(?p=separator)\d{4})?""" 

this regex matches pattern described, think. took own regex, added separator feature, , removed '$'. think '$' gumming works...


Comments

Popular posts from this blog

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -

php - CakePHP HttpSockets send array of paramms -

node.js - Using Node without global install -