Overview

ORB’s long RNA sequencing bioinformatic services extract meaningful results from your study data and provide you with tools for understanding, visualizing and communicating your data to the research community. For each project, ORB consults with the client in order to develop a customized analysis plan which minimizes cost and the delivery of non-essential data.  

Data Processing

ORB mRNA Sequencing Analysis Pipeline
Figure 1. ORB scientists have developed a well-documented and tested software pipeline for the processing and analysis of long RNA sequencing data on ORB's high-performance computing cluster.

 
Generation of Gene and Exon-level Read Counts

ORB's mRNA sequencing analysis pipeline begins with aligning quality-filtered sequencing reads to a species-appropriate reference genome using a splice junction aligner such as TopHat, STAR, or HISAT2. The number of reads overlapping each gene and exons with the reference genome is calculated using a read-counting app such as the easyRNASeq package, HTSeq, or featureCounts. The gene and exon coordinates are provided by Ensembl BioMart annotation.


Analysis of Alternative Splicing

Long RNA sequencing affords the opportunity to quantify expression of individual transcript isoforms and analyze the transcript regulation, including transcription start site usage, alternative splicing, and polyadenylation. To assess the usage of alternative splice variants from mRNA sequencing reads, the splicing index for each exon is calculated by dividing the exon RPKM values by its corresponding gene RPKM values. ORB's software includes gene- and exon-level filtering steps as well as minimum read substitution functions in order to generate reliable splice index values.

ORB also offers differential exon usage analysis using DEXSeq. For researchers who seek to discover potentially novel splice isoforms, ORB's Isoform Splice pipeline will be of interest.  This software utilizes the highly-efficient transcript assembler StringTie to assemble and quantify both known and novel splice isoforms.

De-novo Transcriptome Assembly, and Data from Non-Model Organisms

For projects requiring the determination of transcript-level counts in the absence of a reference genome, ORB can alternatively generate transcript-level counts using additional open-source applications including CuffLinks, RSEM, and Trinity.

Differential Expression Analysis and Custom Statistical Analysis

ORB offers several options for statistical analysis of differential gene expression. ORB's custom statistical analysis software performs 1-, 2-, and 3-way ANOVAs with several options for multiple comparison adjustments, e.g. Tukey's HSD or Dunnett's test, and calculation of the corresponding fold changes. Simple linear regression, multiple linear regression, and logistic regression analyses can be performed when appropriate for the experimental design. ORB also works closely with our clients who want to utilize popular statistical tools specifically designed for mRNA sequencing analysis including DESeq2 and edgeR.

All mRNA sequencing analysis projects come with unsupervised analyses as well, including hierarchical clustering of genes and samples based on the log2 RPKM values and principal components analysis to identify the largest sources of variation between samples in the data set. All ORB statistical analyses are customized in consultation with the experimenter to make sure all appropriate experimental designs and sources of variation can be accounted for and to ensure the analysis will answer the research question at hand.

Gene Set Enrichment Analysis

ORB utilizes WebGestalt in combination with several pathway databases such as KEGG, Wiki Pathways, and Gene Ontology to assess the enrichment of differentially-expressed genes in specific pathways, providing detailed, interactive pathway analysis reports.


Long RNA Data Analysis Packages

Standard Analysis Package
  • Full QC report
  • Gene-level sequence read counts
  • Fold changes, 1-3-way ANOVA, Tukey tests
  • Hierarchical clustering
  • Principal component analysis (PCA)
  • PowerPoint summary
  • FASTQ and BAM files


Long RNA Sample Data Set

Results package of a sample dataset is provided in .zip archive format; to download the package click on the PowerPoint slide image below.

Additional Options

  • Visualization of sequence reads on the genome scaffold using IGV software
  • Exon-level count (splicing index) statistical analysis
  • Differential splicing and promoter usage using CuffDiff software
  • Gene set enrichment analysis
  • Linear and logistic regression

Specialty Analysis

  • Enhanced alternatively splicing analysis, including plotting of exon usage in alternatively spliced mRNAs.
  • De novo transcriptome assembly and blast-based annotation of transcripts from novel species.
  • Correlation of microRNA and mRNA sequencing data.

Additional Analysis Capabilities

  • Analysis of polymorphisms in RNA.
  • Blast-based annotation of poorly described transcripts and genes.
  • Methylation sequencing analysis.

Related Bioinformatic Services

Recently, drug discovery and development programs, especially in the oncology area, have incorporated xenograft models and patient-derived xenograft models to enable investigations of drug therapies in an in vivo system. However, transcriptomic analysis of xenograft samples can be challenging because data consists of reads from more than one species. ORB computational biologists have developed an efficient and effective pipeline to segregate multi-species reads and ensure gene expression analyses focus on researchers’ species of interest. Visit our xenograft data segregation page for more information.

ORB scientists have also created a suite of metratranscriptomic analyses to compliment our wet lab metatranscriptome profiling services. With ORB’s comprehensive metatranscriptomic data analysis services, researchers can query specific bacterial, fungal or viral species, obtain taxonomic classification using Kraken software, and/ or utilize de novo transcriptome assembly with metagenomic DNA sequences in the absence of a reference genome.

Long non-coding RNAs (lncRNA, non-protein-coding RNAs of greater than 200 nucleotides, play a role in many biological processes and have been associated with over 220 diseases1. This molecules represent a relatively untapped class of potential drug targets and biomarkers. ORB has created a customizable data analysis workflow which includes mapping reads to long ncRNA databases, counting, collecting and appending annotation information to the count tables, statistical analysis, and classification and characterization of the types of lncRNA found in samples.

To maximize biomarker discovery study results, ORB works with well established data modeling and classification algorithms such as logistic regression, support vector machines, and random forest in order to assist clients with predictive modeling. These tools can be used to identify sets of RNA markers whose expression patterns can be used as surrogate markers of efficacy, for early detection of toxicity, for patient stratification, and even for disease prognosis. These analysis are instrumental to companion diagnostics programs and the co-development of therapeutic agents. With each project, ORB performs client-directed analyses to ensure specific research goals are achieved efficiently.

Follow the links below to learn more about these specialized data analysis services.

 

  1. The LncRNA and Disease Database