java - Classifier4j output is flawed -


i'm working through book on machine learning , give example of how check string input see if word misspelling of word (different spellings of celebrities name in case). after running example output either 0.0, .999, or 0.7071067811865475. looked through api , it's suposed able give range of values between 0 , 1 couldn't find explain issue. know there other tools out there same thing tool working properly. here code used test it.

import java.util.arraylist; import java.util.list;  import net.sf.classifier4j.classifierexception; import net.sf.classifier4j.vector.hashmaptermvectorstorage; import net.sf.classifier4j.vector.termvectorstorage; import net.sf.classifier4j.vector.vectorclassifier;   public class britneydilemma {   public britneydilemma() {         list<string> terms = new arraylist<string>();          terms.add("brittany spears");         terms.add("brittney spears");         terms.add("britany spears");         terms.add("britny spears");         terms.add("briteny spears");         terms.add("britteny spears");         terms.add("briney spears");         terms.add("brittny spears");         terms.add("brintey spears");         terms.add("britanny spears");         terms.add("britiny spears");         terms.add("britnet spears");         terms.add("britiney spears");         terms.add("britney spears");         terms.add("britney spearssssss");         terms.add("britne spessssss");           termvectorstorage storage = new hashmaptermvectorstorage();          vectorclassifier vc = new vectorclassifier(storage);          string correctstring = "britney spears";         (string term : terms) {             try {                 vc.teachmatch("britcatagory", correctstring);                 double result = vc.classify("britcatagory", term);                 system.out.println(term + " = " + result);             }             catch (classifierexception e) {                 e.printstacktrace();             }         } }  public static void main(string[] args) {     britneydilemma bd = new britneydilemma();     } }` 

vectorclassifier splits incoming string tokens (words) , checks wether or not appear in correct strings (i.e. strings passed teachmatch method), considering frequency of these words. vectorclassifier does not calculate likeliness of words. can check if @ it's source code.

you have correct last name "spears" in every string, , name "britney" messed in of strings, vectorclassifier finds 1 matching word , 1 non-matching word. string "britney spearssssss" finds 1 matching word (name "britney" in case) , 1 non-matching word (last name). that's why, of these strings, vectorclassifier gives identical results.

for string matches ("britney spears"), gives best score close 1.

for string has no matching words ("britne spessssss") gives zero.


Comments

Popular posts from this blog

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -

php - CakePHP HttpSockets send array of paramms -

node.js - Using Node without global install -