About dSpliceType

dSpliceType is a fast, effective and accurate computational method to detect various types of differential splicing and differential expression events between two conditions using RNA-Seq. The five most common types of splicing events include skipped exon (SE), retained intron (RI), alternative 3' or 5' splice sites (A3SS or A5SS), and mutually exclusive exons (MXE). The method utilizes sequential dependency of base-wise read coverage signals and captures biological variability among replicates using a multivariate statistical model. dSpliceType substantially reduces sequencing biases by taking ratio of normalized RNA-Seq splicing indexes at each nucleotide between disease and control conditions. dSpliceType employs a change-point analysis followed by a parametric statistical test using Schwarz Information Criterion (SIC) on each candidate splicing event for differential splicing event detection. It can detect various types of differential splicing events from a wide range of expressed genes, including genes with lower abundances.dSpliceType detects differential expression using a multivariate or a univariate statistical test based on the global RNA-Seq splicing indexes calculated on left and right common exons.

The figure below shows the workflow of the generalized dSpliceType for detecting various types of differential splicing events and differential expression. A) Five most common types of splicing events. Left panel represents SE, RI, A3SS and A5SS events, and right panel represents MXE event. B) Candidate splicing events are compiled by removing introns and concatenating left common exon, spliced exon(s) or exonic region and right common exon. C) For each candidate splicing event (illustrated by A5SS and MXE events), read coverage signals are calculated on nucleotides for each replicate in both conditions. D.1) and E) Local RNA-Seq splicing indexes and normalized logRatio of splicing indexes are calculated based on read coverage signals. dSpliceType detects the differential splicing events by identifying change points on the ending locations of exon(s) or exonic region D.2) Global RNA-Seq splicing indexes are calculated on left and right common exons. dSpliceType detects differential expression using a multivariate or a univariate statistical test based on the data quality. F) A scatter-plot of differential expression against differential splicing to detect their synergistic and antagonistic effects.

Software Download

dSpliceType-2.0.0 can be downloaded from here. The package contains an example dataset and command. You can also find the perl code to transform .gtf files to .gff format. Since Ensembl annotation files provide much more transcripts in each gene, we suggest users to download and use Ensembl annotation files for the species you study, especially human and mouse. You can download from ftp://ftp.ensembl.org/pub/. Please use bedtools to generate read coverage (.bedgraph) files.

For example,

perl ensembl_gtf_to_gff.pl Homo_sapiens.GRCh37.57.gtf > Homo_sapiens.GRCh37.57.gff

User Guide

Command-line usage:

java -jar [-Xmx memory] dSpliceType.jar -g <.gff> -b1 <c1.bedgrapgh,...,cn.bedgraph> -b2 <t1.bedgraph,...,tn.bedgraph> -j1 <c1_junc.bed,...,cn_junc.bed> -j2 <t1_junc.bed,...,tn_junc.bed> [options]

Necessary parameters:

-g <.gff> : provide the annotation file in .gff format.

-b1 <c1.bedgraph,...,cn.bedgraph> : provide read coverage (.bedgraph) files for replicates in condition 1.

-b2 <t1.bedgraph,...,tn.bedgraph> : provide read coverage (.bedgraph) files for replicates in condition 2.

-j1 <junc_c1.bed,...,junc_cn.bed> : provide junction (.bed) files for replicates in condition 1. For example, junction .bed file from Tophat or Tophat2.

-j2 <junc_t1.bed,...,junc_tn.bed> : provide junction (.bed) files for replicates in condition 2. For example, junction .bed file from Tophat or Tophat2.

Options:

-o <string> : The output folder. (default: the current working folder)

-L <float> : The Lower bound of splicing ratio. Considered as a differential splicing event if the average splicing ratio of the spliced region is lower than the provided value [< 1.0]. (default 0.8)

-U <float> : The Upper bound of splicing ratio. Considered as a differential splicing event if the average splicing ratio of the spliced region is greater than the provided value [> 1.0]. (default 1.2)

-a <float> : The Significance level (alpha) of adjusted p-values for differential splicing detection.(default 0.05)

-C <float> : The lowest cut off of the average read coverage of the spliced exon/exonic region. (default 10)

-jL <int> : The number of types of junctions spanning any two exons in the splicing event. 1 (not stringent): at least one type of junctions supporting the splicing event; 2 (stringent): at least two types of junctions supporting the splicing event; 3 (most stringent): all types of junctions supporting the splicing event. (default 2)

All default values are for real-world RNA-Seq datasets. Please adjust the vaules based on your need.

Example for Linux or Mac OS X 64-bit:

java -jar dSpliceType.jar -g testGenes.gff -b1 Example/cntl_1.bedgraph,Example/cntl_2.bedgraph -b2 Example/case_1.bedgraph,Example/case_2.bedgraph -j1 Example/cntl_1.junc.bed,Example/cntl_2.junc.bed -j2 Example/case_1.junc.bed,Example/case_2.junc.bed -o output

Example for Windows 64-bit:

java -jar dSpliceType.jar -g testGenes.gff -b1 Example\cntl_1.bedgraph,Example\cntl_2.bedgraph -b2 Example\case_1.bedgraph,Example\case_2.bedgraph -j1 Example\cntl_1.junc.bed,Example\cntl_2.junc.bed -j2 Example\case_1.junc.bed,Example\case_2.junc.bed -o output

Output Files:

We output five .txt files based on different types of detected splicing events, which are named as dSpliceType_SE, dSpliceType_RI, dSpliceType_A3SS, dSpliceType_A5SS, dSpliceType_MXE. In each file, we have the following columns (using dSpliceType_SE file as an example):

SEID : The ID of SE candidate event when extracting from annotation file.

geneID : For example, ENSG00000065882.

geneName : For example, chr4 + TBC1D1.

SE_Event ; : For example, ENSG00000065882_38053520_38053681. The numbers indicate the start (38053520) and ending (38053681) coordinates of the splicing region within the gene.

changePoint_L : For example, 280. The location of the first change point along the candidate splicing event. The number also indicates the length of the left exon.

changePoint_R : For example, 441. The location of the second change point along the candidate splicing event. The number also indicates the total length of the left and the middle (spliced) exons. (changePoint_R - changePoint_L) is equal to the length of the spliced exon.

SplicingRatio : The ratio of the average of normalized splicing index in the splicing regions of replicates between case and control conditions. The greater or smaller values of SplicingRatio, the more obvious splicing events. For example, less than 0.5 or greater than 2.

Coverage1 : The average of raw coverage signals in the splicing regions of the replicates in control condition.

Coverage2 : The average of raw coverage signals in the splicing regions of the replicates in case condition. If the ratio of Coverage2 and Coverage1 is roughly equal to the SplicingRatio, it means there is no differential expression involved in the gene. Otherwise, gene level differential expression may be significant.

SIC_ij : The SIC score specified by two change points at the ending locations of the left and middle (spliced) exons.

SIC_n : The SIC score calculated without change points.

DS_p_value : The raw p-value of the differential splicing test statistic.

DS_adj_p_value : The adjusted p-value of differential splicing using the stringent Bonferroni's procedure.

DE_foldChange : The foldChange of differential expression associated with the differential splicing event.

DE_p_value : The raw p-value of the differential expression test statistic.

DE_adj_p_value : The adjusted p-value of differential expression using the stringent Bonferroni's procedure.

Citation

Zhu D, Deng N and Bai CX. (2015) A generalized dSpliceType framework to detect differential splicing and differential expression events using RNA-Seq. IEEE Transactions on NanoBioscience, doi: 10.1109/TNB.2015.2388593.

Bioinformatics and Machine Learning Research Group

Copyright © 2014-2015. All Rights Reserved.