Xenograft Data Segregation
Advancing Novel Oncology Therapies

Xenograft models facilitate the study of tumor response to therapeutic agents in an in vivo system through the transplantation of human tumor cells into immunodeficient mice, such as T-cell deficient nude mice or B- and T-cell deficient SCID mice1. Because xenograft models enable the evaluation of efficacy and toxicity markers, mechanism of action analysis, as well as the identification of drug response markers in a living organism, these models provide more information than a singular in vitro or preclinical model can alone2,3.

Patient-Derived Xenograft (PDX) models are powerful tools to enable the identification of patient stratification markers to separate potential responders and non-responders, as well as markers associated with therapeutic treatments4.

Figure 1. Ocean Ridge Biosciences' xenograft data segregation pipeline utilizes two alignment steps to determine species-specific reads.

ORB performs several types of assays using xenograft samples as input; these include small RNA sequencing, microRNA microarray, and gene expression microarray. For long RNA sequencing projects in particular, ORB is able to take advantage of the accuracy of Illumina sequencing in order to filter out data derived from the mouse host RNA and retain only data derived from the human transplant for further analysis.

A persistent technical challenge associated with xenograft or PDX studies is the introduction of biases from the contamination of mouse-derived sequences. To fulfill ORB's commitment to delivery of the highest quality data and results packages, ORB computational scientists have developed an in silico strategy to separate sequences based on their species-of-origin. This process involves two alignments to both the human and mouse genomes with the second alignment targeting ambiguous reads only, see Figure 1. Exon– and gene-level read counting is then performed for each species separately. A demonstration analysis using published sequencing data is provided below.

Segregation of sequencing data derived from multi-species samples is not limited to human/ mouse xenografts. Please contact us to discuss your specific xenograft model system and project goals; ORB’s scientists welcome new challenges and relish the opportunity to assist clients in meeting or exceeding the milestones required to achieve overall project success!

Figure 2. Lahrens et. al. investigated species-specific bias in RNA sequencing data produced from the analysis of ribosomal RNA depleted pooled human IVT transcripts and mouse liver total RNA. Libraries were prepared using the Illumina TruSeq protocol and sequenced on an Illumina HiSeq-2000 instrument.  Click above image to enlarge for details about the sample processing.

Demonstration of Xenograft Data Segregation Pipeline

To demonstrate the efficacy of ORB’s analysis pipeline for sorting sequencing data derived from human versus mouse species, a published dataset from Lahens et al.5 was utilized. SRA data for the analysis was downloaded from NCBI Gene Expression Omnibus using accession number - GSE50445. In this RNA sequencing study, human in vitro transcribed transcripts from 1,062 unique transcripts were mixed with mouse liver total RNA. Ribosomal RNA was depleted from these pools, and libraries, which were constructed using an Illumina TruSeq kit, were sequenced using the Illumina HiSeq-2000 instrument, see Figure 2. ORB’s xenograft data segregation pipeline was able to sort human-derived reads in to the appropriate human-only bin with a false positive rate of < 0.5% and a false-negative rate of less than 3% (Table I). This pipeline has been utilized in several client studies revealing that xenograft tumors isolated from the mouse following experimental treatment can be comprised of up to 35% mouse tissue.

Table I. Demonstration of ORB’s species-specific data segregation capabilities using data from Lahens et al.

Human IVT
RNA (ng)
Mouse Liver
RNA (ng)
Expected %
Total Reads
% Human Only % Mouse Only % Ambiguous
2,500 0 100.0% 37,717,501 97.6% 2.2% 0.2%
2,500 0 100.0% 28,670,167 97.5% 2.3% 0.2%
150 2,350 61.5% 33,187,119 61.8% 33.9% 4.3%
150 2,350 61.5% 28,913,303 64.6% 32.0% 3.5%
75 2,425 43.6% 39,183,205 45.9% 46.4% 7.7%
75 2,425 43.6% 32,595,817 46.8% 48.3% 4.9%
15 2,485 13.1% 35,745,234 12.8% 74.4% 12.8%
15 2,485 13.1% 30,793,506 12.4% 78.5% 9.1%
0 2,500 0.0% 37,255,373 0.2% 83.0% 16.8%
0 2,500 0.0% 35,410,350 0.4% 88.1% 11.5%

Lahens et all., prepared and sequenced libraries from RNA samples containing the indicated mass of human IVT synthetic transcripts and mouse Liver RNA, respectively. Column 3 (Expected % Human) show the expected percentage of reads that should be human assuming that 4% of liver RNA is polyadenylated. The columns shaded light blue give the total mapped reads from the sequenced FASTQ files as well as the percentage of reads that were sorted into each bin.


  1. Ito, M., Hiramatsu, H., Kobayashi, K., Suzue, K., Kawahata, M., Hioki, K.,& Heike, T. (2002). NOD/SCID/γ mouse: an excellent recipient mouse model for engraftment of human cells. Blood100(9), 3175-3182.
  2. Hidalgo, M., Amant, F., Biankin, A. V., Budinská, E., Byrne, A. T., Caldas, C., ... & Roman-Roman, S. (2014). Patient-derived xenograft models: an emerging platform for translational cancer research. Cancer discovery4(9), 998-1013.
  3. Sano, D., & Myers, J. N. (2009). Xenograft models of head and neck cancers. Head & neck oncology1(1), 32.
  4. Jung, J. (2014). Human tumor xenograft models for preclinical assessment of anticancer drug development. Toxicological Res30(1), 1-5.
  5. Lahens, N. F., Kavakli, I. H., Zhang, R., Hayer, K., Black, M. B., Dueck, H., & Grant, G. R. (2014). IVT-seq reveals extreme bias in RNA sequencing. Genome biology15(6), R86.