Metatranscriptomics Bioinformatics
 

Alignment to Target Species      Taxonomic Classification      De Novo Transcriptome Assembly

ORB's comprehensive metatranscriptome analysis pipeline powers studies of diverse microbial populations based on RNA sequencing from biospecimens and environmental samples. ORB’s computational and statistical strategies include targeted genome alignment, taxonomic classification, and de novo transcriptome assembly. These research tools provide an unprecedented opportunity to examine gene regulation for many microbial species simultaneously and determination of which genes encoded in a metagenome are actually transcribed. Each of ORB’s metatranscriptome bioinformatic service packages can be combined with analysis of human epithelial cell transcripts to enhance the understanding of a host—microbiome relationship. Contact us with your unique microbiome research objectives to determine which strategies or combination of analyses will be most appropriate for your study!

"Metatranscriptomics-alignment-47-species"
Fig. 1 Classification of stool samples based on alignment and read counting using a 47 species reference genome set.

Alignment to a set of target genomes

This straightforward approach aligns sequencing reads to a bacterial, fungal or viral reference genome specific to the research project, e.g. 116 bacterial strains from 47 species representing the metatranscriptome of human stool samples, see Figure 1. ORB provides complementary consultations to determine which reference genome would be most useful for a specific research project. Advantages of targeted alignment include minimized risk of mapping to unintended species, a simpler analysis for ease of interpretation, and provision of detailed and specific annotation for mapped genes.

Taxonomic classification using Kraken

To enable ultrafast and highly accurate assignment of taxonomic labels to metagenomic DNA sequences, ORB utilizes the Kraken taxonomic classification system. Kraken is superior to earlier classification programs in terms of speed with classification of 100 base pair reads at a rate of over 4.1 million reads per minute; 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Furthermore, with regard to accuracy, Kraken is comparable to the fastest BLAST program. This high speed and accuracy are achievable through the examination k-mers within a read. Each k-mer is searched against a database that maps k-mers to the least common ancestor (LCA) of all organisms whose genomes contain the k-mer. A score is calculated based on how the k-mers in a sequencing read mapped to leaves in the general taxonomy tree and the read is assigned to the taxon with the highest score1.The database of k-mers to LCAs is built using all the genomes from ogranisms in the RefSeq bacterial, viral, and archaea domains. ORB harnesses Kraken’s novel algorithm to process the disparate results returned by NCBI taxonomic information and genomes for the bacterial, archaeal, and viral domains.  See Figure 2 for an interactive visualization of Kraken taxonomic assignment output.

De Novo Transcriptome Assembly with Trinity

For most metatranscriptomics projects, the species, genera and even phyla present in samples are largely unknown, and thus determination of the microbial community in the complete absence of a reference genome is essential. ORB employs Trinity, a combination of three independent software modules (Inchworm, Chrysalis, and Butterfly), for efficient de novo full-length transcriptome reconstruction; view ORB's Trinity analysis pipeline in Figure 3. This strategy not only accurately reflects sample composition but also enables potential discovery of new genes and species2.


Typical Results Package

Raw Data Files

  • Raw sequencing reads in FASTQ format
  • Annotated assemblies
  • Tables of annotated raw and RPM (reads per million) counts

Analysis Report

  • PowerPoint presentation containing details of laboratory and bioinformatic processing methods, quality control metrics, summary of statistical analysis results, graphs, charts, and links to all primary data files.
  • Html reports containing Hierarchical Clustering Heat Maps and figures and detailed results from Principal Components Analysis
  • Sample correlation matrix
  • Plots of expression for most significant genes based on custom-selected criteria
  • Detailed tables and summary graphs from taxonomic analysis
  • Krona taxonomic classification visualization report
  • Excel-format statistical analysis results report containing a summary of statistical analysis findings, full results tables containing formatted and color-coded fold changes, P and FDR values from custom statistical analysis, and detailed annotation information for each gene with links
Fig. 2. Snapshot of Krona visualization of Kraken taxonomic assignment and counting data using raw data from a published study examining the effect of diet on the human gut microbiome. Click on the image above in order to explore the data set interactively.

Figure 3. Diagram of ORB’s Trinity analysis pipeline for Metatranscriptomic analysis. Click on the image to expand.

Contact us to discuss your metatranscriptomics project and how Ocean Ridge Biosciences can help you achieve your research goals!

References

  1. Wood, D. E., & Salzberg, S. L. (2014). Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome biology15(3), R46.
  2. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A. (2011) [Trinity] Full-length transcriptome assembly from RNA-Seq data without a reference genome. Biotechnology 29, 644- 652.