Everything you need should be checked into BitBucket
BitBucket Sunflower Genome Repository
Workflow
Integrate Physical and Genetic Map for every genetic map. This will generate 3 tabular placements files (well placed BACs, poorly placed BACs, unplaced BACs) with these columns. Not all of them will be full:
- PHYSICAL_CONTIG: physical map contig id
- BAC: BAC id
- TAG: tag id
- CHROMOSOME: chromosome
- CM: cm
- LO_CM: lower cm range
- HI_CM: upper cm range
- LO_LINKAGE_GROUP_BIN: lower genetic map linkage group bin for this GENOME_CONTIG & GENOME_SCAFFOLD
- HI_LINKAGE_GROUP_BIN: upper genetic map linkage group bin for this GENOME_CONTIG & GENOME_SCAFFOLD
- GENOME_CONTIG: genome assembly contig used in genetic map and has a perfect blast hit to TAG
- GENOME_SCAFFOLD: genome assembly contig used in the genetic map and has a perfect blast hit to TAG
- TAG_START_POS_IN_SCAFF: 1-based position of tag in GENOME_SCAFFOLD
- TAG_END_POS_IN_SCAFF: 1 based end position of tag in GENOME_SCAFFOLD
- IS_SINGLECOPY_TAG: whether the tag occurs multiple times in the genome assembly, or if the tag is covered in the read libraries more than expected
- IS_BAC_PLACED: 1=BAC confidently placed on CHROMOSOME, 0=poorly placed due to conflicts, “”=not sure
- IS_BAC_PLACED_BY_TAG_LOCUS: 1=placed using minimum tag hit threshold method, 0=poorly placed via minimum tag hit threshold method, “”=not enough tags to attempt placing via minimum tag hit threshold
- IS_TAG_PLACEMENT_CONFLICT_IN_BAC: 1=BAC has enough tags to attempt the minimum tag hit threshold method of placing BAC onto locus, and this tag locus conflicts with other tag loci in the BAC
- IS_BAC_PLACED_BY_UNIQUE_TAGSET: 1 = BAC confidently placed on CHROMOSOME via unique tagsets method, 0 = poorly placed due to conflicts via unique tagset method, “” = no unique tagset for this BAC – scaffold pair
- IS_UNIQUE_TAGSET: whether this tag is part of the unique tagset for the BAC-GENOME_SCAFFOLD pair
- IS_LOCUS_CONFLICT_WITH_UNIQUE_TAGSET: 1 = BAC-GENOME_SCAFFOLD share a unique tagset and this tag locus conflicts with majority TAG CHROMOSOME in the unique tagset, 0 = BAC-GENOME_SCAFFOLD share a unique tagset and this TAG CHROMOSOME does not conflict with majority TAG CHROMOSOME in the unique tagset, “” = BAC-scaffold do not share a unique tagset
- IS_BAC_PLACEMENT_CONFLICT_IN_PHYS_CONTIG: 1 = BAC is confidently placed on a CHROMOSOME by any method but it conflicts with the majority CHROMOSOME of the physical map contig
- IS_CHIMERIC_PHYS_CONTIG: 1 = physical map contig is chimeric, 0 = not chimeric, “” = no placed BACs so don’t know
- BAC_START_IN_PHYS_CONTIG: FPC coordinate range start
- BAC_END_IN_PHYS_CONTIG: FPC coordinate range end
- TAG_GROUP: FPC coordinate ranges for TAG. Only gives coordinate ranges that intersect the BAC
- AVE_CM_FOR_PHYS_CONTIG: average cm for physical map contig
- PHYS_CONTIG_GROUP: comma delimited list of linked PHYSICAL_CONTIG
- LO_CM_FOR_PHYS_CONTIG_GROUP: lower cm range for PHYS_CONTIG_GROUP
- HI_CM_FOR_PHYS_CONTIG_GROUP: higher cm range for PHYS_CONTIG_GROUP
- IS_REVERSE_COMP_SCAFF: whether scaffold should be reverse complemented
- GENOME_ASSEMBLY: genome assembly name
- SCORE: alignment score between GENOME_SCAFFOLD and BAC_GROUP
- BAC_GROUP: group of overlapping bacs that are aligned to GENOME_SCAFFOLD
See
perl -I ./sunflowergenome/Combo_physical_genetic_map -I ./sunflowergenome/Common -I ./sunflowergenome/gMap2 -I ./sunflowergenome/PhysicalMap ./sunflowergenome/Combo_physical_genetic_map/aggregate_indep_integrations.pl --help
Split the FPC Physical Map File for chimerism.
If you haven’t already, create FPC markers using:
perl -I ./sunflowergenome/Combo_physical_genetic_map -I ./sunflowergenome/Common -I ./sunflowergenome/gMap2 -I ./sunflowergenome/PhysicalMap ./sunflowergenome/Combo_physical_genetic_map/CreateFPCMarker.pl ...
and replace … with the following parameters in order:
- Fullpath to output marker .fw file
- Fullpath to output marker .ace file
- Fullpath to output marker .remarks file
- Fullpath to input tabular integrated physical map – genetic map placements file
- Fullpath to input .bands file
- Whether you only care about single copy tags for placing bacs onto chromosomes
In FPC, load your .fpc file, then insert the markers via File > Replace framework markers > select your .fw file. > Save your fpc
Create a newline separated list of BACs that you think are suspect. You can do this using awk.
awk -F '\t' '{print $2}' <tabular placements file for BACs with too many conflicting tag loci> | sort | uniq > myBadBacList.txt
Create a copy of the directory housing your .fpc, .cor, /Bands/*.bands file.
Then feed them into the following script, which builds the .FPC from scratch using an iterative build – dq – merge process from strict to loose stringencies, then splits the contigs with chimers.
perl -I ./sunflowergenome/Combo_physical_genetic_map -I ./sunflowergenome/Common -I ./sunflowergenome/gMap2 -I ./sunflowergenome/PhysicalMap ./sunflowergenome/Combo_physical_genetic_map/rebuildChimericFPC.pl ...
where .. is replaced with the following parameters in order:
- Full path to the revised FPC executable (you must download and compile the new FPC from bitbucket link at top)
- Full path to .bands file
- Start stringency cutoff (e.g. 1e-75)
- End stringency cutoff (e.g. 1e-15)
- The amount to decrease the stringency in each iteration (eg 1e-10)
- CpM entries (if there are hits many markers on a bac, reduce stringency by this much). eg (3=1e-10,4=1e-09,5=1e-08)
- Fullpath and prefix of fpc file (without the .fpc)
- fullpath to wellplaced bacs tabular placements file
- fullpath to poorly placed bacs tabular placements file
- Fullpath to newline separated list of suspect clones causing chimerism in physical map contigs
Run the Physical Map – Genetic Map Integration again, now that the Physical Map Contig IDs have changed
Feed the tabluar integrated file into the pseudomolecule generation program
See
-
perl -I ./sunflowergenome/Combo_physical_genetic_map -I ./sunflowergenome/Common -I ./sunflowergenome/gMap2 -I ./sunflowergenome/PhysicalMap ./sunflowergenome/Combo_physical_genetic_map/ReferenceSeqGenMapAnchor.pl --help
Which Script Do I Run?
If you want to integrate the physical and genetic maps, then generate some scaffold to bac associations, use
SunflowerGenome/Combo_physical_genetic_map/combo_physical_windowedGeneticMap.pl
To get usage details, run:
perl -I ./sunflowergenome/Combo_physical_genetic_map -I ./sunflowergenome/Common -I ./sunflowergenome/gMap2 -I ./sunflowergenome/PhysicalMap ./sunflowergenome/Combo_physical_genetic_map/combo_physical_windowedGeneticMap.pl --help
To integrate the physical and genetic map from multiple genetic map and aggregate the placements, use
SunflowerGenome/Combo_physical_genetic_map/aggregate_indep_integrations.pl
To get usage details, run:
perl -I ./sunflowergenome/Combo_physical_genetic_map -I ./sunflowergenome/Common -I ./sunflowergenome/gMap2 -I ./sunflowergenome/PhysicalMap ./sunflowergenome/Combo_physical_genetic_map/aggregate_indep_integrations.pl --help
To Split FPC files for chimeric contigs, use
SunflowerGenome / PhysicalMap / RemoveChimericBacs.pl
To generate pseudomolecules, use
SunflowerGenome/Combo_physical_genetic_map/ReferenceSeqGenMapAnchor.pl
To get usage details, run:
perl -I ./sunflowergenome/Combo_physical_genetic_map -I ./sunflowergenome/Common -I ./sunflowergenome/gMap2 -I ./sunflowergenome/PhysicalMap ./sunflowergenome/Combo_physical_genetic_map/ReferenceSeqGenMapAnchor.pl --help