algorithm - Sequence Alignment: Avoid improbable alignments -


i using algorithm equivalent needleman-wunsch algorithm fuzzy sequence matching using similarity matrix.

some of results near optimal:

sil d   e:  n   sil a+  r   t   i:  k   e+  l   sil sil a+  f   t   @   sil b   u:  @   n   @   sil au  s -   d   e:  n   -   -   @   t   e:  k   9   l   sil "   a+  f   d   @   -   b   9   a+  n   @   sil au  s 

but not:

sil d   e+  r   sil i+  n   h   a+  l   t   sil s+  t   e:  t   sil u: -   -   -   -   -   -   -   z   i+  -   k   -   -   -   -   f   -   -  

the problem occurs around deletions , insertions: algorithm aligns single letters near deletion, hardly match missing parts.

i have tried penalize beginning of gaps, algorithm favors large gaps on small ones. results horrible, because can see above, gaps of length 1 , 2 common in correctly aligned parts.

how modify algorithm avoid doing these wrong alignments consisting of spread out letters bad scores (such f in - - - - f - -, should -)?

edit: of not familiar algorithm: when scores calculated, way taken not known, because way depends on guess what: scores.

this means when calculating scores can not take account neighboring alignments, because unknown. if alignment enough or not depends on neighbors: if pair bad fit (remember: use similarity matrix filled probabilities) , surrounded gaps, should bad score (see second example). if surrounded other, better fitting pairs, should score (see first example).

so having bit of chicken , egg problem when calculating scores.


Comments

Popular posts from this blog

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -

php - CakePHP HttpSockets send array of paramms -

node.js - Using Node without global install -