De novo assembly of the genome of a haloacid degrading Burkholderiacaribensis (#327)
Burkholderia caribensis MBA4 is a soil bacterium that can utilize 2-haloacid as carbon and energy source. A dehalogenase (Deh4a) that removes the halogen from the carbon backbone will produce an alkanoic acid which can be subsequently utilized. However, little is known the genome of MBA4. Most Burkholderia genus bacteria have genome size of ~9Mb. Preliminary pulse field gel electrophoresis analysis of the genome of MBA4 suggested that it has a genome size of about 9.4Mb with at least three replicons. Genomic data of MBA4 were obtained with Illumina (Solexa) and 454 pyrosequencing technology. Four sets of Illumina paired end reads with the insert sizes of 100bp, 300bp, 500bp and 2000bp are available. With low quality reads being discarded, the average length of the 454 single end reads is 450bp after trimming. Raw data were de novo assembled with CLC Genomic Workbench 6.01. SSPACE was used to join contigs into scaffolds with information derived from paired end reads. As a result, sixty-six scaffolds with 82,106 Ns were obtained. The N50 of these scaffolds is 463,897bp and the total size of the draft genome sequences is 9,453,932bp. The GC content of MBA4 was determined to be about 62%, which is consistent with the result obtained from HPLC analysis. Bioinformatic prediction of repeats indicates the presence of 530 tandem repeated regions and 172 transposons. The complete genome sequence of MBA4 is yet to be determined due to the presence of long repetitive sequences.
The quality of the genome assembly analysis was greatly improved by combining de novo assembled RNA-seq contigs with the genomic draft scaffolds. This includes filling some unknown region inside the scaffolds, base pair corrections and joining scaffolds. Fifteen scaffolds are now obtained with 24,664 Ns. The N50 is 2,442,814bp and the total size is 9,416,790 bp.