Unicode Tagging in Python NLTK -

March 15, 2013

i working on python nltk tagging program. input file hindi text containing several lines. on tokenizing text , using pos_tag output nn tag only. english sentence input proper tagging. kindly help. version - python 3.4.1, nltk 3.0 documentation

kindly help! here tried.

word_to_be_tagged = u"ताजो स्वास आनी चकचकीत दांत तुमचें व्यक्तीमत्व परजळायतात."

from nltk.corpus import indian  train_data = indian.tagged_sents('hindi.pos')[:300]  test_data = indian.tagged_sents('hindi.pos')[301:]   print(word_to_be_tagged) print (train_data)

and output different.

ताजो स्वास आनी चकचकीत दांत तुमचें व्यक्तीमत्व परजळायतात. [[('पूर्ण', 'jj'), ('प्रतिबंध', 'nn'), ('हटाओ', 'vfm'), (':', 'sym'), ('इराक', 'nnp')], [('संयुक्त', 'nnc'), ('राष्ट्र', 'nn'), ('।', 'sym')], ...]

the problem should use hindi pos tagger:

from nltk.corpus import indian nltk.tag import tnt  train_data = indian.tagged_sents('hindi.pos') tnt_pos_tagger = tnt.tnt() tnt_pos_tagger.train(train_data) #training tnt part of speech tagger hindi data  print tnt_pos_tagger.tag(nltk.word_tokenize(word_to_be_tagged))

the problem part of speech tagger accurate in specific domain (mostly combination of language , topic). in english, of words tagger haven't seen yet nouns (nn), tags data nn only.

if train same domain want tag after (hindi), should ok.

see this more explanations.

Search This Blog

Call

Unicode Tagging in Python NLTK -

Comments

Post a Comment

Popular posts from this blog

node.js - Using Node without global install -

php - CakePHP HttpSockets send array of paramms -

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -