bash - Compare consecutive columns of a file and obtain the number of matched elements -
i want compare consecutive columns of file , return number of matched elements. prefer use shell scripting or awk. here sample bash/awk script trying use.
#!/bin/bash in 3 4 5 6 7 8 9 j in 3 4 5 6 7 8 9 `awk "$i == $j" phased.txt | wc -l` done done i have file of size 147189*828 , want compare each columns , return number of matched elements in 828*828 matrix(a similarity matrix). easy in matlab, but, takes long time load huge files. can compare 2 columns , return number of matched elements following awk command:
awk '$3==$4' phased.txt | wc -l but need entire file.
a snippet of data i'm working on is:
# sampleid hgdp00511 hgdp00511 hgdp00512 hgdp00512 hgdp00513 hgdp00513 m rs4124251 0 0 g 0 m rs6650104 0 c t 0 0 m rs12184279 0 0 g t 0 ................................................................................ after comparing compute 6*6 matrix in case: containing matching percentage of these columns.
in bash, variables need $ interpreted, awk "$i == $j" phased.txt | wc -l evaluated awk "3 == 4" phased.txt | wc -l; then, because of backticks (`), shell try execute command. awk see $3 == $4, need add \$: awk "\$$i == \$$j" phased.txt | wc -l.
#!/bin/bash in 3 4 5 6 7 8 9 j in 3 4 5 6 7 8 9 awk "\$$i == \$$j" phased.txt | wc -l done done though you'll want show combination you're evaluating:
#!/bin/bash in 3 4 5 6 7 8 9 j in 3 4 5 6 7 8 9 echo "$i $j: $(awk "\$$i == \$$j" phased.txt | wc -l)" done done you count in awk directly
#!/bin/bash in 3 4 5 6 7 8 9 j in 3 4 5 6 7 8 9 echo "$i $j: $(awk "\$$i == \$$j {count++}; end{print count}" phased.txt)" done done finally, whole thing in awk; it'll faster honest it's not cleaner: [untested]
#!/usr/bin/env awk -f { (i = 3; <= 9; i++) { (j = 3; j <= 9; j++) { if ($i == $j) { counts[i, j]++ } } } } end { (i = 3; <= 9; i++) { (j = 3; j <= 9; j++) { printf "%d = %d: %d\n", i, j, counts[i, j] } } }
Comments
Post a Comment