Turning STACKS output into IMa2 input files

Posted on October 30, 2013 by Rose

This script extract sequence haplotypes from the “alleles.tsv” files generated by STACKS and does some light filtering (you may want to add more). It’s very similar to the one I used for our 2013 Molecular Ecology paper, and still has some Great Sand Dunes-specific parameter names, but should work ok for other data sets. Oh, and I was using the “pstacks” reference-guided workflow in a slightly older version STACKS, in case that matters.

extract_haplotype_sequences_v4_annotated.r

example_alleles.tsv

Please let me know if you use this script and whether it needs tweaking.

Genotype By Sequencing (GBS) Barcodes

Posted on November 23, 2012 by Rose

Here are GBS_Barcodes and adapters that we currently have in the lab for GBS, for sequencing on an Illumina machine. They were designed using the site Deena Bioinformatics.

This information came from Greg Baute’s blog and I’ve just converted the file to .xls.

Jaatha – training data sets (Rose)

Posted on April 27, 2012 by Rose

I’ve generated three training data sets, which will save you around 5 days if you decide to run Jaatha, a molecular demography program. It uses the joint site frequency spectrum of two populations to model various aspects of population history (split time, population size and growth, migration). Here’s the paper: Naduvilezhath et al 2011.

1. Using the default model, with the following maxima: tmax=20, mmax=5, qmax=10.

2. Alternative maxima: tmax=5, mmax=20, qmax=20.

3. Alternative maxima: tmax=5, mmax=20, qmax=5.

They can’t be uploaded because they’re compressed R data structures, but let me know if you’d like to give them a whirl.

If BWA wants *.nt.ann file… (Rose)

Posted on February 3, 2012 by Rose

Recently BWA (an alignment program) suddenly started giving a strange error message, indicating that a reference file ending in *.nt.ann was missing. This file type was unfamiliar to me, with good reason: it’s a colourspace reference file, which shouldn’t be generated when we index the fasta-based references we’re using (at least, I don’t know of anyone in our lab using SOLID data as a reference). DO NOT rebuild the reference with the -c (colourspace) flag, as you might see suggested on the web, because we don’t know what effect that might have on our alignments. DO rebuild it with the usual settings.

Lab camera manual: Panasonic Lumix DMC-ZS7 (Rose)

Posted on January 31, 2012 by Rose

A couple of points:

1. There is a spare battery, so please swap out and charge the one that you have just used.

2. The photos stored on the card could be deleted at any time (if a big job needs more room on the storage card), so PLEASE download them before you take the camera back to the lab to avoid losing them.

3. The GPS should be turned on only when you need it (and set to OFF when you get on a plane).

DMCZS7 Basic Operating Instructions

DMCZS7 Operating Instructions

Bioportal (Rose)

Posted on January 30, 2012 by Rose

Bioportal is a free computing resource that provides several applications in our area. I’ve been running STRUCTURE on both the “low priority” and normal queues and it’s been fantastic (unlike Westgrid, who haven’t even responded to my application). For those of you who are struggling to find room on the cluster, it might be useful to you too. Much as I’d like to keep it to myself and exploit the hell out of it, here’s the address:

https://www.bioportal.uio.no/

SNP summary statistics in R: ‘hierfstat’ is back and better than before! (Rose)

Posted on January 2, 2012 by Rose

After being disabled and not supported for several months, ‘hierfstat’ (by Jerome Goudet) now has lots of useful (and fast) calculations of summary statistics, including expected and observed heterozygosity, Fst and Jost’s Dest.

Continue reading →

STACKS installation (Rose)

Posted on December 12, 2011 by Rose

Installing stacks on Ubuntu Natty Narwhal or Oneiric Ocelot

STACKS is a piece of software produced by Julian Catchen in the Cresko lab. It’s designed to identify loci and alleles from RAD (or GBS) reads either de novo or after alignment to a reference. It consists of several modules that can be run separately, but to completely install it as a pipeline, it relies on a web server, unfortunately. Many of the required instructions are given in the README file, but because nobody in our lab is an expert on this, we had to fiddle around to get the program running on our Ubuntu machines.

Continue reading →

R script for plotting STRUCTURE results (Q values) (Rose)

Posted on November 16, 2011 by Rose

This is an R Script that plots individual Q values and labels populations. It can be modified to take average group membership from CLUMPP output and/or to import different population names and higher level groupings from elsewhere.

N.B. I haven’t run this on very many data sets, so it will probably need to be tweaked for your results. But please leave a comment if you run into any problems.

Continue reading →

Our favourite text editors (Rose)

Posted on November 9, 2011 by Rose

I hope we can start a conversation about this because a good text editor can make a big difference to a newbie, so PLEASE REPLY!!! I wanted to proselytise about Npp, but it only runs on Windows. So if you use a different OS, please make that BLEEDINGLY OBVIOUS.

Notepad++ (WINDOWS )

I’ve tried a numerous text editors over the years (like Context), but Notepad++ (Npp) is easily my favourite. It only runs on Windows, but I use it to export Unix formatted files routinely. You can set shortcut keys to change formats very easily. Npp can highlight lots of languages, including R, perl and unix. You can also define your own languages for highlighting – I did that to make my Migrate parameter files easier to read.
Continue reading →

Filtering unmapped/unaligned reads from SAM files (Rose)

Posted on November 2, 2011 by Rose

This is a post about some time-saving help Chris Grassa gave me.

STACKS (post coming soon) doesn’t deal well with all of the unaligned reads in SAM files, so I tried using PICARD to remove them. However, PICARD doesn’t like the SAM output of BWA, but Chris G showed me how to use the Unix command awk to do it much more easily. This is his command for my file 1076.sam:
Continue reading →

Rieseberg Lab Resources

RLR: Technical resources for Rieseberglers

Author Archives: Rose

Turning STACKS output into IMa2 input files

Genotype By Sequencing (GBS) Barcodes

Jaatha – training data sets (Rose)

If BWA wants *.nt.ann file… (Rose)

Lab camera manual: Panasonic Lumix DMC-ZS7 (Rose)

Bioportal (Rose)

SNP summary statistics in R: ‘hierfstat’ is back and better than before! (Rose)

STACKS installation (Rose)

R script for plotting STRUCTURE results (Q values) (Rose)

Our favourite text editors (Rose)

Filtering unmapped/unaligned reads from SAM files (Rose)