algorithm - Sequence Alignment: Avoid improbable alignments -
i using algorithm equivalent needleman-wunsch algorithm fuzzy sequence matching using similarity matrix.
some of results near optimal:
sil d e: n sil a+ r t i: k e+ l sil sil a+ f t @ sil b u: @ n @ sil au s - d e: n - - @ t e: k 9 l sil " a+ f d @ - b 9 a+ n @ sil au s
but not:
sil d e+ r sil i+ n h a+ l t sil s+ t e: t sil u: - - - - - - - z i+ - k - - - - f - -
the problem occurs around deletions , insertions: algorithm aligns single letters near deletion, hardly match missing parts.
i have tried penalize beginning of gaps, algorithm favors large gaps on small ones. results horrible, because can see above, gaps of length 1 , 2 common in correctly aligned parts.
how modify algorithm avoid doing these wrong alignments consisting of spread out letters bad scores (such f
in - - - - f - -
, should -
)?
edit: of not familiar algorithm: when scores calculated, way taken not known, because way depends on guess what: scores.
this means when calculating scores can not take account neighboring alignments, because unknown. if alignment enough or not depends on neighbors: if pair bad fit (remember: use similarity matrix filled probabilities) , surrounded gaps, should bad score (see second example). if surrounded other, better fitting pairs, should score (see first example).
so having bit of chicken , egg problem when calculating scores.
Comments
Post a Comment