SCISSOR

SCISSOR is an R package containing tools for statistical analysis and visualization of base-level RNA-seq data.

Overview

SCISSOR (shape changes in selecting sample outliers in RNA-seq) aims for unsupervised screening of a range of structural alterations in RNA-seq data. SCISSOR considers a novel shape property of aligned short read data through a base-level pileup file. This intact and uncompressed view of RNA-seq profile enables the unbiased discovery of structural alterations by looking for anomalous shapes in expression. This approach holds promise for identifying otherwise obscured genetic aberrations. As a result, SCISSOR identifies known as well as novel aberrations including abnormal splicing, intra-/intergenic deletions, small indels, alternative transcription start/termination.

Statistical model

With the goal of detecting samples exhibiting anomalous shapes, SCISSOR models base-level read counts using a high-dimensional latent variable framework that is naturally integrated to its normalization, abnormal feature extraction, and quantification. A latent variable is used to model an underlying abnormal trajectory, i.e. an outlier direction in a high-dimensional space, that is interrogated for outliers. An outlier case with shape changes then can be a data point that is strongly involved in one or multiple abnormal trajectories, which enables modeling complex structural variation.

Statistical method

SCISSOR extracts a latent space associated with abnormal sequencing coverage and quantifies the level of abnormality in a robust way for determining the cases with shape changes. As the type of structure of interest is outlying/abnormal, it uses a projection pursuit approach to measure how outlying a sample is in the most extreme one-dimensional direction. At each gene under consideration, the resulting statistic is an outlyingness score for each sample with larger values indicating more severe deviation from other samples in the dataset. For each outlier, SCISSOR produces the most outlying direction as a single best trajectory that describes abnormalities of the corresponding outlier, which can be used to recover the latent space of outlying outlier directions.

Free software

SCISSOR is free software and available on Github.

Documentation