Sequence, Assemble, Annotate, Align, Species Tree .. next? (#148)
The reduction in costs of High Throughput Sequencing (HTS) means it is now tractable for many academic groups to de novo sequence entire genomes. Transforming that data into a useful form for addressing questions concerning evolutionary or functional genomics is not trivial. We are developing software tools aimed at extracting useful evolutionary information from whole genome data. I will focus on software suites that address (1) the de novo assembly / annotation workflow and (2) the challenge of drawing statistically robust phylogenomic inferences from the resulting data. I will particularly expand on (2). This integrated suite is customised for utilising whole genomic sequences to estimate a species tree and to identify sequences evolving in a distinctive manner. The tools are centred on a maximum-likelihood phylogenetic framework that can simultaneously apply codon and nucleotide substitution models to mixed protein coding and non-protein coding DNA sequences. For illustrative purposes, we applied the suite to analyse published whole genome data from Cryptococcus strains. Our results are highly concordant with those obtained from MLST sequences. Additional capabilities of the software, including identifying genes exhibiting the signature of clade specific natural selection, will also be discussed.