ios - How to effectively use the Levenshtein algorithm for text auto-completion -


i'm using levenshtein distance algorithm filter through text in order determine best matching result purpose of text field auto-completion (and top 5 best results).

currently, have array of strings, , apply algorithm each 1 in attempt determine how close of match text typed user. problem i'm not sure how interpret values outputted algorithm rank results expected.

for example: (text typed = "nvmb")

  1. result: "game" ; levenshtein distance = 3 (best match)
  2. result: "number stars" ; levenshtein distance = 13 (second best match)

this technically makes sense; second result needs many more 'edits', because of it's length. problem second result logically , visually closer match first one. it's if should ignore characters longer length of typed text.

any ideas on how achieve this?

the levenshtein distance corresponds number of single-character insertions, deletions , substitutions in optimal global pairwise alignment of 2 sequences if gap , mismatch costs 1.

the needleman-wunsch dp algorithm find such alignment, in addition score (it's same dp algorithm 1 used calculate levenshtein distance, option weight gaps, , mismatches between given pair of characters, arbitrarily). there more general models of alignment allow reduced penalties gaps @ start or end (and reduced penalties contiguous blocks of gaps, may useful here, although doesn't directly answer question). @ 1 extreme, have local alignment, pay no penalty @ gaps @ ends -- computed smith-waterman dp algorithm. think want here in-between: want penalise gaps @ start of both query , test strings, , gaps in test string @ end, not gaps in query string @ end. way, trailing mismatches cost nothing, , costs like:

query:    nvmb costs:    0100000000000000      =  1 in total against:  number stars  query:    nvmb costs:    1101                  =  3 in total against:  game  query:    number stars costs:    0100111111111111      = 13 in total against:  nvmb  query:       ber     star costs:    1110001111100000      =  8 in total against:  number stars  query:    numbor costs:    111110000100000000000 =  6 in total against:       number stars 

(in fact might want give trailing mismatches small nonzero penalty, exact match preferred prefix-only match.)

the algorithm

suppose query string has length n, , string b testing against has length m. let d[i][j] dp table value @ (i, j) -- is, cost of optimal alignment of length-i prefix of length-j prefix of b. if go 0 penalty trailing mismatches, need modify nw algorithm in simple way: instead of calculating , returning dp table value d[n][m], need calculate table before, , find minimum of d[n][j], 0 <= j <= m. corresponds best match of query string against any prefix of test string.


Comments

Popular posts from this blog

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -

node.js - Using Node without global install -

php - CakePHP HttpSockets send array of paramms -