Email for v7. Bug found that printed G’s as C’s and vise versa.
Call SNPs from sam files in a method similar to Hohenlohe et al 2010. Updated to v4 Feb 9. Previous version had a bug.
It is now fixed up for all of BWAs cigars flavors.
This only deals with reads that fit one of the following:
Full alignment (55M)
Soft clip at the start (10S45M)
Soft clip at end (45M10S)
One deletion (25M10D25M)
One insertion (20M10I20M)
This means it ignores reads with a cigar fields that have N, H, P, = or X and it ignores reads with a cigar more complicate then a single soft clip or a single indel. It also does not penalize reads adjacent to indels.
It ignores bases in soft clipped parts of reads
Hopefully Rose and/or Chris can comment on the specifics of the algorithm as we have implemented here.
#usage perl bin/rad_lrt_or_ml_v2.pl sam_files.list snp_calls/project1
sam_files.list is a list of the samfiles you want parsed. Simply:
/home/greg/project/sam_files/sample1.sam /home/greg/project/sam_files/sample2.sam /home/greg/project/sam_files/sample3.sam ...
The last argument is where you want them printed, here I have it go to a new folder (which has to exist) and append project1 onto the name. The out put will look something like this:
/home/greg/project/snp_calls_project1_calls_sample1.sam /home/greg/project/snp_calls_project1_counts_sample1.sam /home/greg/project/snp_calls_project1_calls_sample2.sam ...
For your parsing pleasure you can have it print out the read depth count for each nt for each location just set line 14 to “true” (or anything).
#line 14 print out counts my $print_counts = "T"; #if you do not want the counts my $print_counts;
Next merge the SNP calls
Back to population genomics
Hohenlohe PA, Bassham S, Etter PD, Stiffler N, Johnson EA, et al. 2010 Population Genomics of Parallel Adaptation in Threespine Stickleback using
Sequenced RAD Tags. PLoS Genet 6(2): e1000862. doi:10.1371/journal.pgen.1000862
This should probably only be used on genomic DNA.