Text File To kml – Perl Script

Google Earth reads and writes a special form of xml file called a kml (keyhole markup language). Many other geographic viewers and GISs can also read kml files so it’s not a bad thing to be able to make kml files for sample location data. I assume there are many ways to do this. The way I have done it is via a perlscript that I wrote. This post provides that script and explains what it does.

Here is the script, its called texttokml.pl.

It’s very simple and I commented it heavily so even the most naive perl programmer should be able to figure it out and change it but if you want me to hold your hand just ask.

Explanation follows . . .

Continue reading

CheapEasy DIY Barcodes in R

I couldn’t believe how expensive the software was for writing barcodes, so I wrote a short program in R to do it for FREE. And, frankly it should be faster and easier if you already have your labels in an Excel file. You don’t really need to understand the program or even R functions to use it, as long as you know how to run an R program.

Setup and Overview:

[UPDATED (see notes below)] – R-code. Start with this (Note I could not upload a .R file, so this is .txt but still an R program).

Input – barcodes128.csv – You need this file to run the program. Save it in your working directory (see comments in R code for how to set this). AND labels.csv – This is a sample file showing the format for your labels. Even though it’s a .csv, it is a single column with each label as a separate row, so there are no actual commas

Output – BarcodesOut.pdf – A sample output: a pdf file for the 0.5″x1.75″ Worth Poly Label WP0517 (Polyester Label Stock), currently in the lab

That’s really all you need to know, everything that follows is extraneous info. If you have any problems, check out the Detailed Instructions, Troubleshooting Tips, or add a comment below. Continue reading

Old lab PC – new Ubuntu computer

I’ve installed the latest version of Ubuntu (12.04) on the old PC lab computer:

-Username, computer name and password are written on the computer itself, if needed.
-I’ve also installed on it a few of my favorite programs (LibreOffice, Inkscape, Gimp, R, Chrome).
-It boots in about 35 seconds, not bad for an “old piece of junk”!

Feel free to use it!

seb

GBS Protocol (GregO)

Kristin and I have been working on GBS for a long time and since it now seems to be working, we wrote up a protocol. It is mostly the same from Greg Baute’s previous protocol, but with a few key changes (More DNA, more PCR). I’ve made it look nice and included a diagram for ease of thought.

Also, the official pronunciation of GBS is ‘jibs’

Continue reading

RLR Image Library (Dan E.)

Hello all. I’ve created an image library here at RLR. I think its a good idea and I’m hoping that you do too. If we accumulate a collection of good images, mostly photos I assume, it will become useful and interesting.

The idea is that we can post image galleries – collections of photos of or about a particular project/trip/experiment/event – to share with the lab. I hope this sharing will be informative and entertaining but also practical – we can share images for use in presentations and posters and the like.

Right off the bat I want it to be clear that if you use any image from RLR that you did not create yourself, that image must be attributed to its creator. Its easy, just give the person who made and uploaded the image a credit on or near the image in your presentation or poster etc.. Pretty obvious really.

The image library comprises galleries displayed on new pages added to RLR under the “Image Library” page. If you wish, these pages can be hidden such that only registered users who have logged in can see your images.

Check out Brook’s opening effort for an excellent example of what I’m talking about.

Greg O. has also put up some nice shots of Californian sunflowers.

I’ve added instructions on the “How to: contribute content” page and on the “How to: use RLR” page.

If my instructions are insufficient, or you can see obvious problems or improvements, please let me know.

Answers to some questions that you may have . . .

Continue reading

SnoWhite Tips and Troubleshooting (Thuy)

Snowhite is a tool for cleaning 454 and illumina reads.  There are quite a few gotchas that will take you half a day to debug.  This wiki has a lot of good tips.

Snowhite invokes other bioinformatics programs, one of them being TagDust.  If you get a segfault error from TagDust, it may be because you are searching for  contaminant sequences larger than TagDust can handle.  TagDust can only handle maximum 1000 characters per line in the contaminant fasta file and maximum 1000 base contaminant sequence lengths.

A segfault (or segmentation fault) happens when a  program accesses the wrong piece of memory.  After TagDust hits the 1000 line character/sequence base limit, TagDust keeps trying to access memory past the 1000 memory slots it has allocated.  It may try to access non-existent memory locations or off-limits memory locations.  You need to edit the TagDust source  code so it allocates enough memory for the sequences and does not wander into bad memory locations.

  • Go into your TagDust source code directory and edit file “input.c”.
  • Go to line 68:

char line[MAX_LINE];

  • Change MAX_LINE to a number larger than the number of characters in the longest line in your contaminant fasta file.  You probably can skip this step if you are using the NCBI UniVec.fasta files, since the default of 1000 is enough.
  • Go to line 69:

char tmp_seq[MAX_LINE];

  • Change MAX_LINE to a number larger than the number of bases in the longest contaminant sequence in your contaminant fasta file.  I tried 1000000 with a recent NCBI UniVec.fasta file and it worked for me.
  • Recompile your TagDust source code
    • Delete all the existing executables by executing  make clean in the same directory as the Makefile
    • Compile all your files again by executing make clean in the same directory as the Makefile
    • If you decided to allocate a lot of memory to your arrays, and your program requires > 2GB of memory at compile time, you may run into “relocation truncated to fit: R_X86_64_PC32 against symbol” errors during linkage.  This occurs when the compiler is unable to allocate enough space for the program’s statically allocated objects.  Edit the Makefile so that

CC = gcc
becomes
CC = gcc -mcmodel=medium

Compiled Sunflower QTLs (GregO)

Last year I worked on a project to see if any of the domestication outlier genes were found with previously mapped QTLs. The project ultimately fell flat when new data showed that the outlier I was working on wasn’t an outlier, but I did compile a large table of sunflower QTLs which may be useful. The table has 369 mapped QTLs.

I’ve shared this with a couple of people, but I’m posting it here on a google doc for everyone to use. Here is the link: https://docs.google.com/spreadsheet/ccc?key=0AgfXIvTZMEqPdHdJWTk3UVlVa3dkdGFTak9ySlUtNkE

A couple notes:
-It was compiled about a year ago, so it may be out of date. Also, although I tried to include every applicable study, I may have missed some. If you do find a study that I missed, I encourage you to add it to the table.
-It is only from annuus crosses, and a majority are domestics
-The position values are in cM

Anyway, read and enjoy. Change it if you find errors or new papers!

Jaatha – training data sets (Rose)

I’ve generated three training data sets, which will save you around 5 days if you decide to run Jaatha, a molecular demography program. It uses the joint site frequency spectrum of two populations to model various aspects of population history (split time, population size and growth, migration). Here’s the paper: Naduvilezhath et al 2011.

1. Using the default model, with the following maxima: tmax=20, mmax=5, qmax=10.

2. Alternative maxima: tmax=5, mmax=20, qmax=20.

3. Alternative maxima: tmax=5, mmax=20, qmax=5.

They can’t be uploaded because they’re compressed R data structures, but let me know if you’d like to give them a whirl.

Making Illumina Whole Genome Shotgun Sequencing Libraries – (Dan E.)

I’ve been making whole genome shotgun sequencing libraries (for the purposes of this post: WGSS libraries) to sequence sunflower genomes on the Biodiversity Centre’s Illumina HiSeq. I haven’t been doing it for very long and its likely that my approach will change in the future as costs and products change but, as of early 2012, I’ve landed on a hybrid protocol based on kits from an outfit called Bioo Scientific. I use the Bioo Sci. adapter kit and their library prep kit up to the final PCR step at which point I switch to a PCR kit from another outfit called KAPA. I also use a KAPA kit to quant libraries with qPCR. In this post I give a little context then describe what I do to make WGSS libraries . . . Continue reading