Analytical Sciences, Invited lecture
AS-011

Pushing the limits of de novo genome assembly for complex prokaryotic genomes and establishing a basis to utilize microbiome isolates for plant protection

 

C. H. Ahrens1,2
1Agroscope , 2SIB Swiss Institute of Bioinformatics

My genomics and bioinformatics team closely collaborates with experimental groups that have established functional assays to isolate individual, functionally relevant microorganisms involved e.g. in biocontrol, biostimulation, or antibiotics resistance from complex microbiomes such as those present on plant surfaces, fermented foods or in the soil. Such interdisciplinary collaborations are essential for Agroscope’s aim to bring microbiome research into applied practice.

We subsequently de novo assemble complete genomes of such strains using the latest Next Generation Sequencing (NGS) technologies from Pacific Biosciences and Oxford Nanopore Technologies (ONT) and state of the art assembly algorithms. Generating a complete, de novo genome assembly for prokaryotes is generally considered a solved problem because of their relatively small size and low complexity. However, the analysis of 9300 complete prokaryotic genomes indicated that a sizable fraction (10%) either harbored several hundred repeats of up to 7 kb in length (so-called class II genomes), or very long near identical repeats up to over 100 kb (so-called class III genomes), both of which impede complete genome assembly.

By using long PacBio reads, we could recently show that repeat-rich class II genomes of Lactobacillus strains could be readily assembled into complete genomes, and that this approach represents a distinctive advantage over the fragmented assemblies generated by short read Illumina data alone. Using very long reads from ONT, we could even completely assemble the genome of a highly complex Pseudomonas koreensis strain that harbored nearly identical repeats of 70 kilobase pairs. As accurate genome annotation is critical to exploit the deluge of completely sequenced genomes, a publicly available proteogenomics solution for improved genome annotation was developed with funding from the Swiss National Science Foundation.

In this talk, I will highlight some of the lessons learnt over the past four years and present an outlook on how we plan to use functional genomics technologies with the ultimate aim to use naturally occurring microorganisms for plant protection.