1 Introduction

CRISPR screens are becoming more and more common, and as such, so is the need to easily interpret, visualize, compare, and explore the results of these assays.

CRISPRball is a Shiny application to explore, visualize, filter, and integrate CRISPR screens with public data and multiple datasets. In particular, it allows for publication-quality figure generation including full aesthetic customization and interactive labeling, filtering of results using DepMap Common Essential genes, simple comparisons between datasets/timepoints/treatments, etc.

It is designed for end users and may be particularly useful for bioinformatics/genome editing cores that perform basic analyses before returning results to users. Pointing users to the online version of the app (or a hosted one) will allow them to quickly wade through and interpret their data.

Currently, it supports the output from MAGeCK RRA and MLE analysis methods. This package supplements the MAGeCKFlute bioconductor package, adding additional functionality, visualizations, and a Shiny interface to explore the results generated with that package.

Support for the output of additional analysis tools and methods will be added upon request.

1.1 Installation

CRISPRball is available on Bioconductor and can be installed as follows:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("CRISPRball")

1.2 Usage

Starting the app is as simple as calling the CRISPRball function.

library("CRISPRball")
CRISPRball()

Users can then upload their data within the app, which will enable specific tabs in the application as the data is provided.

Screenshot of the `CRISPRball` application, when launched as a server where users can directly upload MAGeCK RRA or MLE output. Information on the format of the expected data are provided in the following sections.

Figure 1: Screenshot of the CRISPRball application, when launched as a server where users can directly upload MAGeCK RRA or MLE output
Information on the format of the expected data are provided in the following sections.

One can also pass their input data directly as input - all that are needed are file paths to the MAGeCK output files.

Passing data directly can be useful when hosting the app on a local Shiny server where having pre-loaded data for the user is wanted. This is particularly useful for core or shared resource facilities that perform basic analyses for end-users.

1.3 MAGeCK RRA Output

In this case, we’ll use the example output from the third MAGeCK tutorial.

In this example, the two datasets are just reverse comparisons (ESC vs plasmid & plasmid vs ESC) where DDX27 has been manually altered in the ESC vs plasmid comparison to no longer be a significant hit.

# Create lists of results summaries for each dataset.
d1.genes <- read.delim(system.file("extdata", "esc1.gene_summary.txt",
    package = "CRISPRball"
), check.names = FALSE)
d2.genes <- read.delim(system.file("extdata", "plasmid.gene_summary.txt",
    package = "CRISPRball"
), check.names = FALSE)

d1.sgrnas <- read.delim(system.file("extdata", "esc1.sgrna_summary.txt",
    package = "CRISPRball"
), check.names = FALSE)
d2.sgrnas <- read.delim(system.file("extdata", "plasmid.sgrna_summary.txt",
    package = "CRISPRball"
), check.names = FALSE)

count.summ <- read.delim(system.file("extdata", "escneg.countsummary.txt",
    package = "CRISPRball"
), check.names = FALSE)
norm.counts <- read.delim(system.file("extdata", "escneg.count_normalized.txt",
    package = "CRISPRball"
), check.names = FALSE)

# Look at the first few rows of the gene summary for the ESC vs plasmid comparison.
head(d1.genes)
##      id num  neg|score neg|p-value  neg|fdr neg|rank neg|goodsgrna neg|lfc
## 1 PMPCB   5 2.7210e-08  4.9505e-06 0.000990        1             4 -4.4769
## 2 DDX27   5 1.9319e-07  4.9505e-06 1.000000        2             5 -4.7853
## 3 PSMD6   5 2.7462e-07  4.9505e-06 0.000990        3             5 -4.3464
## 4  ORC6   4 5.5840e-07  4.9505e-06 0.000990        4             4 -4.7794
## 5  HARS   5 7.2685e-07  4.9505e-06 0.000990        5             5 -4.7221
## 6 PSMB4   5 2.1014e-06  1.4851e-05 0.002122        6             5 -3.4927
##   pos|score pos|p-value  pos|fdr pos|rank pos|goodsgrna pos|lfc
## 1   0.47199     0.63035 0.999995      582             1 -4.4769
## 2   1.00000     1.00000 1.000000     1000             0 -4.7853
## 3   0.99787     0.99790 0.999995      986             0 -4.3464
## 4   1.00000     1.00000 0.999995      999             0 -4.7794
## 5   1.00000     1.00000 0.999995      998             0 -4.7221
## 6   0.99999     1.00000 0.999995      997             0 -3.4927

We can then provide this data to the CRISPRball function.

genes <- list(ESC = d1.genes, plasmid = d2.genes)
sgrnas <- list(ESC = d1.sgrnas, plasmid = d2.sgrnas)

CRISPRball(
    gene.data = genes, sgrna.data = sgrnas,
    count.summary = count.summ, norm.counts = norm.counts
)

1.4 MAGeCK MLE Output

CRISPRball also supports the MLE output from MAGeCK. In this case, we’ll use the example data from the fourth MAGeCK tutorial.

# Create lists of results summaries for each dataset.
genes <- read_mle_gene_summary(system.file("extdata", "beta_leukemia.gene_summary.txt",
    package = "CRISPRball"
))

count.summ <- read.delim(system.file("extdata", "escneg.countsummary.txt",
    package = "CRISPRball"
), check.names = FALSE)
norm.counts <- read.delim(system.file("extdata", "escneg.count_normalized.txt",
    package = "CRISPRball"
), check.names = FALSE)

CRISPRball(
    gene.data = genes, 
    count.summary = count.summ, norm.counts = norm.counts
)

1.5 The QC Tab

On load, the application will display the QC tab, which provides multiple interactive plots to assess the quality control of all samples in the dataset. Plots include those that assess the Gini Index (a measure of read distribution inequality), counts of fully depleted sgRNAs, percentage of reads mapped, read distributions across sgRNAs, correlation matrix between samples, and a PCA plot containing all samples.

Controls to adjust the plots are provided in the sidebar in the left side of the application. Plots are re-sizable and easily download as SVGs using the plotly controls. In addition, interactive HTML versions of the plots can be downloaded with the Download buttons above or below each plot.

Screenshot of the `CRISPRball` application, when launched with MAGeCk RRA output provided.

Figure 2: Screenshot of the CRISPRball application, when launched with MAGeCk RRA output provided

1.6 The Gene (Overview) Tab

This tab provides an overview of the gene-level results for up to two comparisons. It includes a volcano plot, rank plot, and randomly ordered “lawn” plot for each comparison of interest. All plots are interactive and fully customizable via the controls in the sidebar.

Screenshot of the `CRISPRball` Gene (Overview) tab.

Figure 3: Screenshot of the CRISPRball Gene (Overview) tab

Common hits between two comparisons can easily be highlighted by selecting the Highlight Common Hits checkbox in the sidebar.

1.6.1 Highlighting Gene(sets)

Often, it can be useful to label a set of genes on the plots. This can be done by passing a named list of gene identifiers to the genesets argument.

library("msigdbr")

# Retrieve MSigDB Hallmark gene sets and convert to a named list.
gene.sets <- msigdbr(species = "Homo sapiens", category = "H")
gene.sets <- gene.sets %>% split(x = .$gene_symbol, f = .$gs_name)

# Can also add genesets manually.
gene.sets[["my_fav_genes"]] <- c("TOP2A", "FECH", "SOX2", "DUT", "RELA")

CRISPRball(
    gene.data = genes, sgrna.data = sgrnas, count.summary = count.summ,
    norm.counts = norm.counts, genesets = gene.sets
)

Such genesets can then be highlighted very easily on the plots in the Gene (Overview) tab using the “Highlight Gene(sets)” inputs in the sidebar.

Individual genes of interest can also be added by the user manually from within the app.

1.7 The sgRNA Tab

This tab is mostly only useful with MAGeCK RRA results. It provides two plots for up to two comparisons. The first allows users to view normalized sgRNA counts across two conditions - which is helpful for identification of poor sgRNAs and validation of hits.

The second provides the sgRNA rank among all sgRNAs for a chosen effect size variable. This provides a useful view of how a given sgRNA compares to all sgRNAs.

1.8 The DepMap Tab

DepMap contains a multitude of data for hundreds of cell lines, including CRISPR/RNAi screen results, gene expression data, copy number variation, and more. This data can be extremely useful to remove common dependencies from your own results, look at expression or copy number for a given gene across various lineages or diseases, and to identify hits that are selective to a given (sub)lineage.

For fast access to this data, CRISPRball includes a function (build_depmap_db()) to build a SQLite database using the depmap R package. This database will be large, >4 GB.

This SQLite database can then be passed to the app and the data contained therein easily explored in the DepMap tab.

library("depmap")
library("pool")
library("RSQLite")

# This will take a few minutes to run.
# The database will be named "depmap_db.sqlite" and placed in the working directory.
build_depmap_db()

CRISPRball(
    gene.data = genes, sgrna.data = sgrnas, count.summary = count.summ,
    norm.counts = norm.counts, genesets = gene.sets, depmap.db = "depmap_db.sqlite"
)
Screenshot of the `CRISPRball` application, with focus on the DepMap tab for CDK2.

Figure 4: Screenshot of the CRISPRball application, with focus on the DepMap tab for CDK2

1.8.1 Filtering DepMap Common Essential Hits

When provided, one can also easily filter the DepMap common essential hits from the results in the Gene (Overview) tab via the checkboxes in the sidebar.

1.9 Additional Help

Almost every input in the app will display a helpful tooltip explaining its function on hover. Plots also have an information icon that will explain the plot contents when hovered or clicked.

1.10 SessionInfo

Click to expand

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] BiocStyle_2.34.0
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37       R6_2.5.1            bookdown_0.41      
##  [4] fastmap_1.2.0       xfun_0.48           cachem_1.1.0       
##  [7] knitr_1.48          htmltools_0.5.8.1   rmarkdown_2.28     
## [10] lifecycle_1.0.4     cli_3.6.3           sass_0.4.9         
## [13] jquerylib_0.1.4     compiler_4.4.1      highr_0.11         
## [16] tools_4.4.1         evaluate_1.0.1      bslib_0.8.0        
## [19] yaml_2.3.10         BiocManager_1.30.25 jsonlite_1.8.9     
## [22] rlang_1.1.4