Rose coded this up as a faster and efficient way to combine all the snp calls into one table. I’ve made a few modifications, hopefully its not broken. Updates are likely in the future.
Copy and save it. Run
chmod u+x bin/merge_ann.sh
to make it exucutable. To run it simply call it
merge_ann.sh
. It will find all the “calls” files inside the folder “snp_calls” and merge them into a file called radML_calls. Warning: this will write over any files in the directory you run it in called tmp, tmp2, tmp3 tmplist, samplelist and a few others.
merge_ann.sh:
#!/bin/bash #RA LANG=C # this part generates the list of positions to which the gentypes will be added ls snp_calls/ | grep _calls_ | sed s/^/'snp_calls\/'/ > list.calls rm tmplist* while read line do awk '{print $1 "\t" $2}' $line >> tmplist sort -f -k1,1 tmplist | uniq > tmplist2 cp tmplist2 tmplist done < list.calls cp tmplist merged_calls.list # not sure about this one awk '{print $1 "_" $2 "\t" $1 "\t" $2}' merged_calls.list | sort -k1,1 > merged_calls.prelim while read line do awk '{print $1 "_" $2 "\t" $3}' $line | sort -f -k1,1 > tmp1 join -i -a 1 -a 1 -e '-' -o '1.1,2.2' merged_calls.prelim tmp1 > tmp2 join -i -a 1 -a 1 merged_calls.prelim tmp2 > tmp3 cp tmp3 merged_calls.prelim done < list.calls sort -nk2,3 tmp3 > tmp4 sed s/'snp_calls\/'// list.calls > samplelist sed s/.sam// samplelist > samplelist1 awk '{printf ("%s%s", tab, $1); tab=" "} END {print ""}' samplelist1 | sed s/^/"list contig pos "/ > header cat header tmp4 | tr ' ' '\t' > radML_calls
Prev SNP calling with ML
Back to Population Genomics.
Next Parse table with Perl