The depmap
package aims to provide a reproducible research framework
to cancer dependency data described by
Tsherniak, Aviad, et al. “Defining a cancer dependency map.” Cell 170.3 (2017): 564-576..
The data found in the depmap
package has been formatted to facilitate the use of common R packages
such as dplyr
and ggplot2
. We hope that this package will allow
researchers to more easily mine, explore and visually illustrate
dependency data taken from the Depmap cancer genomic dependency study.
To install depmap, the BiocManager Bioconductor Project Package Manager is required. If BiocManager is not already installed, it will need to be done so beforehand. Type (within R) install.packages(“BiocManager”) (This needs to be done just once.)
install.packages("BiocManager")
BiocManager::install("depmap")
The depmap
package fully depends on the ExperimentHub
Bioconductor package,
which allows the data accessed in this package to be stored and retrieved from
the cloud.
library("depmap")
library("ExperimentHub")
The depmap
package currently contains eight datasets available through ExperimentHub
.
The data found in this R package has been converted from a “wide”
format .csv
file to “long” format .rda file. None of the values taken
from the original datasets have been changed, although the columns
have been re-arranged. Descriptions of the changes made are described
under the Details
section after querying the relevant dataset.
## create ExperimentHub query object
eh <- ExperimentHub()
## snapshotDate(): 2019-10-22
query(eh, "depmap")
## ExperimentHub with 22 records
## # snapshotDate(): 2019-10-22
## # $dataprovider: Broad Institute
## # $species: Homo sapiens
## # $rdataclass: tibble
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass,
## # tags, rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["EH2260"]]'
##
## title
## EH2260 | rnai_19Q1
## EH2261 | crispr_19Q1
## EH2262 | copyNumber_19Q1
## EH2263 | RPPA_19Q1
## EH2264 | TPM_19Q1
## ... ...
## EH3083 | RPPA_19Q3
## EH3084 | TPM_19Q3
## EH3085 | mutationCalls_19Q3
## EH3086 | metadata_19Q3
## EH3087 | drug_sensitivity_19Q3
Each dataset has a ExperimentHub
accession number, (e.g. EH2260 refers to
the rnai
dataset from the 19Q1 release).
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
The rnai
dataset contains the combined genetic dependency data for RNAi -
induced gene knockdown for select genes and cancer cell lines. This data
corresponds to the D2_combined_genetic_dependency_scores.csv
file found in the
19Q3 depmap release and includes 17309
genes, 712 cell lines, 30 primary diseases and 31
lineages.
## access `rnai_19Q1` by EH number
rnai <- eh[["EH2260"]]
rnai
## # A tibble: 12,324,008 x 6
## depmap_id cell_line gene gene_name entrez_id dependency
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 ACH-001270 127399_SOFT_TIS… A1BG (1) A1BG 1 NA
## 2 ACH-001270 127399_SOFT_TIS… NAT2 (10) NAT2 10 NA
## 3 ACH-001270 127399_SOFT_TIS… ADA (100) ADA 100 NA
## 4 ACH-001270 127399_SOFT_TIS… CDH2 (1000) CDH2 1000 -0.195
## 5 ACH-001270 127399_SOFT_TIS… AKT3 (10000) AKT3 10000 -0.256
## 6 ACH-001270 127399_SOFT_TIS… MED6 (10001) MED6 10001 -0.174
## 7 ACH-001270 127399_SOFT_TIS… NR2E3 (10002) NR2E3 10002 -0.140
## 8 ACH-001270 127399_SOFT_TIS… NAALAD2 (10003) NAALAD2 10003 NA
## 9 ACH-001270 127399_SOFT_TIS… DUXB (10003341… DUXB 100033411 NA
## 10 ACH-001270 127399_SOFT_TIS… PDZK1P1 (10003… PDZK1P1 100034743 NA
## # … with 12,323,998 more rows
The rnai
dataset can also be accessed by using the depmap_rnai
function.
# depmap_rnai()
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
The crispr
dataset contains the (batch corrected CERES inferred gene effect)
CRISPR-Cas9 knockout data of select genes and cancer cell lines. This data
corresponds to the gene_effect_corrected.csv
file from the 19Q3
depmap release. Data from this dataset includes 17634
genes, 558 cell lines, 26 primary diseases, 28
lineages.
## access `crispr_19Q1` by EH number
crispr <- eh[["EH2261"]]
crispr
## # A tibble: 9,839,772 x 6
## depmap_id cell_line gene gene_name entrez_id dependency
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 ACH-000004 HEL_HAEMATOPOIETIC_AND_L… A1BG … A1BG 1 0.135
## 2 ACH-000005 HEL9217_HAEMATOPOIETIC_A… A1BG … A1BG 1 -0.212
## 3 ACH-000007 LS513_LARGE_INTESTINE A1BG … A1BG 1 0.0433
## 4 ACH-000009 C2BBE1_LARGE_INTESTINE A1BG … A1BG 1 0.0705
## 5 ACH-000011 253J_URINARY_TRACT A1BG … A1BG 1 0.191
## 6 ACH-000012 HCC827_LUNG A1BG … A1BG 1 -0.0104
## 7 ACH-000013 ONCODG1_OVARY A1BG … A1BG 1 0.0210
## 8 ACH-000014 HS294T_SKIN A1BG … A1BG 1 0.113
## 9 ACH-000015 NCIH1581_LUNG A1BG … A1BG 1 -0.0742
## 10 ACH-000017 SKBR3_BREAST A1BG … A1BG 1 0.133
## # … with 9,839,762 more rows
The crispr
dataset can also be accessed by using the depmap_crispr
function.
# depmap_crispr()
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
The copyNumber
dataset contains the WES copy number data, relating to the
numerical log-fold copy number change measured against the baseline copy number
of select genes and cell lines. This dataset corresponds to the
public_19Q1_gene_cn.csv
from the 19Q3 depmap release.
This dataset includes 23299 genes,
1604 cell lines, 38 primary diseases and 33
lineages.
## access `copyNumber_19Q1` by EH number
copyNumber <- eh[["EH2262"]]
copyNumber
## # A tibble: 37,371,596 x 6
## depmap_id cell_line gene gene_name entrez_id log_copy_number
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 ACH-000011 253J_URINARY_TRACT A1BG … A1BG 1 0.131
## 2 ACH-000026 253JBV_URINARY_TRACT A1BG … A1BG 1 -0.237
## 3 ACH-000086 ACCMESO1_PLEURA A1BG … A1BG 1 0.134
## 4 ACH-000557 AML193_HAEMATOPOIET… A1BG … A1BG 1 -0.0208
## 5 ACH-000838 AMO1_HAEMATOPOIETIC… A1BG … A1BG 1 0.170
## 6 ACH-000080 BDCM_HAEMATOPOIETIC… A1BG … A1BG 1 0.00703
## 7 ACH-000992 BICR18_UPPER_AERODI… A1BG … A1BG 1 -0.376
## 8 ACH-000228 BICR31_UPPER_AERODI… A1BG … A1BG 1 1.16
## 9 ACH-000771 BICR56_UPPER_AERODI… A1BG … A1BG 1 0.0197
## 10 ACH-000415 BICR6_UPPER_AERODIG… A1BG … A1BG 1 0.280
## # … with 37,371,586 more rows
The copyNumber
dataset can also be accessed by using the depmap_copyNumber
function.
# depmap_copyNumber()
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
The RPPA
dataset contains the CCLE Reverse Phase Protein Array (RPPA) data
which corresponds to the CCLE_RPPA_20180123.csv
file from the 19Q3
depmap release. This dataset includes 214 genes, 899
cell lines, 28 primary diseases, 28 lineages.
## access `RPPA_19Q1` by EH number
RPPA <- eh[["EH2263"]]
RPPA
## # A tibble: 192,386 x 4
## depmap_id cell_line antibody expression
## <chr> <chr> <chr> <dbl>
## 1 ACH-000698 DMS53_LUNG 14-3-3_be… -0.105
## 2 ACH-000489 SW1116_LARGE_INTESTINE 14-3-3_be… 0.359
## 3 ACH-000431 NCIH1694_LUNG 14-3-3_be… 0.0287
## 4 ACH-000707 P3HR1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE 14-3-3_be… 0.120
## 5 ACH-000509 HUT78_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE 14-3-3_be… -0.269
## 6 ACH-000522 UMUC3_URINARY_TRACT 14-3-3_be… -0.171
## 7 ACH-000613 HOS_BONE 14-3-3_be… -0.0253
## 8 ACH-000829 HUNS1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE 14-3-3_be… -0.170
## 9 ACH-000557 AML193_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE 14-3-3_be… 0.0819
## 10 ACH-000614 RVH421_SKIN 14-3-3_be… 0.222
## # … with 192,376 more rows
The RPPA
dataset can also be accessed by using the depmap_RPPA
function.
# depmap_RPPA()
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
The TPM
dataset contains the CCLE RNAseq gene expression data. This shows
expression data only for protein coding genes (using scale log2(TPM+1)). This
data corresponds to the CCLE_depMap_19Q1_TPM.csv
file from the 19Q3
depmap release. This dataset includes 55825 genes,
1165 cell lines, 33 primary Diseases, 32 lineages.
## access `TPM_19Q1` by EH number
TPM <- eh[["EH2264"]]
TPM
## # A tibble: 67,360,300 x 6
## depmap_id cell_line gene gene_name ensembl_id expression
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 ACH-000956 22RV1_PROSTATE TSPAN6 (E… TSPAN6 ENSG000000… 2.65
## 2 ACH-000948 2313287_STOMACH TSPAN6 (E… TSPAN6 ENSG000000… 3.00
## 3 ACH-000026 253JBV_URINARY_TRA… TSPAN6 (E… TSPAN6 ENSG000000… 4.57
## 4 ACH-000011 253J_URINARY_TRACT TSPAN6 (E… TSPAN6 ENSG000000… 4.58
## 5 ACH-000323 42MGBA_CENTRAL_NER… TSPAN6 (E… TSPAN6 ENSG000000… 4.59
## 6 ACH-000905 5637_URINARY_TRACT TSPAN6 (E… TSPAN6 ENSG000000… 5.88
## 7 ACH-000520 59M_OVARY TSPAN6 (E… TSPAN6 ENSG000000… 4.11
## 8 ACH-000973 639V_URINARY_TRACT TSPAN6 (E… TSPAN6 ENSG000000… 5.05
## 9 ACH-000896 647V_URINARY_TRACT TSPAN6 (E… TSPAN6 ENSG000000… 5.94
## 10 ACH-000070 697_HAEMATOPOIETIC… TSPAN6 (E… TSPAN6 ENSG000000… 0.151
## # … with 67,360,290 more rows
The TPM
dataset can also be accessed by using the depmap_TPM
function.
# depmap_TPM()
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
The metadata
dataset contains the metadata about all of the cancer cell lines.
It corresponds to the depmap_19Q1_cell_lines.csv
file found in the 19Q3
depmap release. This dataset includes 0 genes, 1676
cell lines, 38 primary diseases and 33 lineages.
## access `metadata_19Q1` by EH number
metadata <- eh[["EH2266"]]
metadata
## # A tibble: 1,676 x 9
## depmap_id cell_line aliases cosmic_id sanger_id primary_disease
## <chr> <chr> <chr> <dbl> <dbl> <chr>
## 1 ACH-0000… NIHOVCAR… NIH:OV… 905933 2201 Ovarian Cancer
## 2 ACH-0000… HL60_HAE… HL-60 905938 55 Leukemia
## 3 ACH-0000… CACO2_LA… CACO2;… NA NA Colon/Colorect…
## 4 ACH-0000… HEL_HAEM… HEL 907053 783 Leukemia
## 5 ACH-0000… HEL9217_… HEL 92… NA NA Leukemia
## 6 ACH-0000… MONOMAC6… MONO-M… 908148 2167 Leukemia
## 7 ACH-0000… LS513_LA… LS513 907795 569 Colon/Colorect…
## 8 ACH-0000… C2BBE1_L… C2BBe1 910700 2104 Colon/Colorect…
## 9 ACH-0000… NCIH2077… NCI-H2… NA NA Lung Cancer
## 10 ACH-0000… 253J_URI… 253J NA NA Bladder Cancer
## # … with 1,666 more rows, and 3 more variables: subtype_disease <chr>,
## # gender <chr>, source <chr>
The metadat
dataset can also be accessed by using the depmap_metadata
function.
# depmap_metadata()
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
The mutationCalls
dataset contains all merged mutation calls (coding region,
germline filtered) found in the depmap dependency study. This dataset
corresponds with the depmap_19Q1_mutation_calls.csv
file found in the
19Q3 depmap release and includes
19350 genes,
1601 cell lines, 37 primary diseases and
33 lineages.
## access `mutationCalls_19Q1` by EH number
mutationCalls <- eh[["EH2265"]]
mutationCalls
## # A tibble: 1,243,145 x 35
## depmap_id gene_name entrez_id ncbi_build chromosome start_pos end_pos
## <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 ACH-0000… VPS13D 55187 37 1 12359347 1.24e7
## 2 ACH-0000… AADACL4 343066 37 1 12726308 1.27e7
## 3 ACH-0000… IFNLR1 163702 37 1 24484172 2.45e7
## 4 ACH-0000… TMEM57 55219 37 1 25785018 2.58e7
## 5 ACH-0000… ZSCAN20 7579 37 1 33954141 3.40e7
## 6 ACH-0000… POU3F1 5453 37 1 38512139 3.85e7
## 7 ACH-0000… MAST2 23139 37 1 46498028 4.65e7
## 8 ACH-0000… GBP4 115361 37 1 89657103 8.97e7
## 9 ACH-0000… VAV3 10451 37 1 108247170 1.08e8
## 10 ACH-0000… NBPF20 100288142 37 1 148346689 1.48e8
## # … with 1,243,135 more rows, and 28 more variables: strand <chr>,
## # var_class <chr>, var_type <chr>, ref_allele <chr>,
## # tumor_seq_allele1 <chr>, dbSNP_RS <chr>, dbSNP_val_status <chr>,
## # genome_change <chr>, annotation_transcript <chr>,
## # tumor_sample_barcode <chr>, cDNA_change <chr>, codon_change <chr>,
## # protein_change <chr>, is_deleterious <lgl>, is_tcga_hotspot <lgl>,
## # tcga_hsCnt <dbl>, is_cosmic_hotspot <lgl>, cosmic_hsCnt <dbl>,
## # ExAC_AF <dbl>, VA_WES_AC <chr>, CGA_WES_AC <chr>, sanger_WES_AC <chr>,
## # sanger_recalib_WES_AC <chr>, RNAseq_AC <chr>, HC_AC <chr>, RD_AC <chr>,
## # WGS_AC <chr>, var_annotation <chr>
The mutationCalls
dataset can also be accessed by using the
depmap_mutationCalls
function.
# depmap_mutationCalls()
If desired, the original data from which the
depmap
package were derived from can be downloaded from the Broad Institute
website. The instructions on how to download these files and how the data was
transformed and loaded into the depmap
package can be found in the make_data.R
file found in ./inst/scripts
. (It
should be noted that the original uncompressed .csv files are >1.5GB in
total and take a moderate amount of time to download remotely.)
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ExperimentHub_1.12.0 AnnotationHub_2.18.0 BiocFileCache_1.10.0
## [4] dbplyr_1.4.2 BiocGenerics_0.32.0 depmap_1.0.0
## [7] dplyr_0.8.3 BiocStyle_2.14.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_0.2.5 xfun_0.10
## [3] BiocVersion_3.10.1 purrr_0.3.3
## [5] vctrs_0.2.0 htmltools_0.4.0
## [7] stats4_3.6.1 yaml_2.2.0
## [9] utf8_1.1.4 interactiveDisplayBase_1.24.0
## [11] blob_1.2.0 rlang_0.4.1
## [13] pillar_1.4.2 later_1.0.0
## [15] glue_1.3.1 DBI_1.0.0
## [17] rappdirs_0.3.1 bit64_0.9-7
## [19] stringr_1.4.0 memoise_1.1.0
## [21] evaluate_0.14 Biobase_2.46.0
## [23] knitr_1.25 IRanges_2.20.0
## [25] fastmap_1.0.1 httpuv_1.5.2
## [27] curl_4.2 AnnotationDbi_1.48.0
## [29] fansi_0.4.0 Rcpp_1.0.2
## [31] xtable_1.8-4 backports_1.1.5
## [33] promises_1.1.0 BiocManager_1.30.9
## [35] S4Vectors_0.24.0 mime_0.7
## [37] bit_1.1-14 digest_0.6.22
## [39] stringi_1.4.3 bookdown_0.14
## [41] shiny_1.4.0 cli_1.1.0
## [43] tools_3.6.1 magrittr_1.5
## [45] tibble_2.1.3 RSQLite_2.1.2
## [47] crayon_1.3.4 pkgconfig_2.0.3
## [49] zeallot_0.1.0 assertthat_0.2.1
## [51] rmarkdown_1.16 httr_1.4.1
## [53] R6_2.4.0 compiler_3.6.1