bash - Compare consecutive columns of a file and obtain the number of matched elements -

March 15, 2013

i want compare consecutive columns of file , return number of matched elements. prefer use shell scripting or awk. here sample bash/awk script trying use.

#!/bin/bash in 3 4 5 6 7 8 9      j in 3 4 5 6 7 8 9         `awk "$i == $j" phased.txt | wc -l`       done done

i have file of size 147189*828 , want compare each columns , return number of matched elements in 828*828 matrix(a similarity matrix). easy in matlab, but, takes long time load huge files. can compare 2 columns , return number of matched elements following awk command:

awk '$3==$4' phased.txt | wc -l

but need entire file.

a snippet of data i'm working on is:

# sampleid  hgdp00511  hgdp00511   hgdp00512   hgdp00512   hgdp00513  hgdp00513     m rs4124251       0       0                       g          0           m rs6650104       0                  c            t          0          0  m rs12184279      0       0            g                     t          0  ................................................................................

after comparing compute 6*6 matrix in case: containing matching percentage of these columns.

in bash, variables need $ interpreted, awk "$i == $j" phased.txt | wc -l evaluated awk "3 == 4" phased.txt | wc -l; then, because of backticks (`), shell try execute command. awk see $3 == $4, need add \$: awk "\$$i == \$$j" phased.txt | wc -l.

#!/bin/bash in 3 4 5 6 7 8 9        j in 3 4 5 6 7 8 9       awk "\$$i == \$$j" phased.txt | wc -l   done done

though you'll want show combination you're evaluating:

#!/bin/bash in 3 4 5 6 7 8 9        j in 3 4 5 6 7 8 9       echo "$i $j: $(awk "\$$i == \$$j" phased.txt | wc -l)"   done done

you count in awk directly

#!/bin/bash in 3 4 5 6 7 8 9        j in 3 4 5 6 7 8 9       echo "$i $j: $(awk "\$$i == \$$j {count++}; end{print count}" phased.txt)"   done done

finally, whole thing in awk; it'll faster honest it's not cleaner: [untested]

#!/usr/bin/env awk -f  {     (i = 3; <= 9; i++) {         (j = 3; j <= 9; j++) {             if ($i == $j) {                 counts[i, j]++             }         }     } }  end {     (i = 3; <= 9; i++) {         (j = 3; j <= 9; j++) {             printf "%d = %d: %d\n", i, j, counts[i, j]         }     } }

Search This Blog

Call

bash - Compare consecutive columns of a file and obtain the number of matched elements -

Comments

Post a Comment

Popular posts from this blog

java - JavaScript + Thymeleaf - select onchange -

node.js - Using Node without global install -

How to access a php class file from PHPFox framework into javascript code written in simple HTML file? -