CTDquerier 1.2.0
The Comparative Toxicogenomics Database (CTDbase; http://ctdbase.org) is a public resource for toxicogenomic information manually curated from the peer-reviewed scientific literature, providing key information about the interactions of environmental chemicals with gene products and their effect on human disease [1][2].
CTDquerier
R packageCTDquerier
is an R package that allows to R users to download basic data from CTDbase about genes, chemicals and diseases. Once the user’s input is validated allows to query CTDbase to download the information of the given input from the other modules.
CTDbase offers a public web-based interface that includes basic and advanced query options to access data for sequences, references, and toxic agents, and a platform for analysis sequences.
In order to query CTDbase with a single term (aka. a gene, a chemical or a diseases) users can access to the web portal and use the keyword search.
Looking for the associations in CTDbase for the following set of then genes of interest implies to perform ten queries using this interface.
Follows the summary page of the results obtained after searching for the term XKR4:
The Batch Query tool (http://ctdbase.org/tools/batchQuery.go) is a provided by CTDbase and allows to download custom data associated with a set of chemicals, diseases and genes amount others.
Given a set of terms the tool allows to download (as .tsv
, .xml
, …) curated or inferred data from CTDbase associated to the terms of interest. Table 1 indicates the type of available data depending on input terms, being C
curated, I
inferred, E
enriched and A
all.
Data Available/Input Data | Chemicals | Diseases | Genes |
---|---|---|---|
Chemical–gene interactions | C | C | |
Chemical associations | A,C,I | C | |
Gene associations | C | A,C,I | C |
Disease associations | A,C,I | A,C,I | |
Pathway associations | I,E | I | C |
Gene Ontology associations | A,E | A |
The resulting tables obtained from querying CTDbase using the Batch Query tool with the gene XKR4 and asking for associated chemicals and associated diseases (curated, inferred and all) are included in CTDquerier
R package (queries performed 2018/JAN/02).
These four files can be loaded as follows:
# Chemicals - XKR4
bq_xkr4_c <- system.file(
paste0( "extdata", .Platform$file.sep, "bq_xkr4_chem.tsv" ),
package="CTDquerier"
)
nrow( read.delim( bq_xkr4_c, sep = "\t" ) )
## [1] 18
# Diseses curated - XKR4
bq_xkr4_dC <- system.file(
paste0( "extdata", .Platform$file.sep, "bq_xkr4_disease_curated.tsv" ),
package="CTDquerier"
)
nrow( read.delim( bq_xkr4_dC, sep = "\t" ) )
## [1] 1
# Diseases inferred - XKR4
bq_xkr4_dI <- system.file(
paste0( "extdata", .Platform$file.sep, "bq_xkr4_disease_inferred.tsv" ),
package="CTDquerier"
)
nrow( read.delim( bq_xkr4_dI, sep = "\t" ) )
## [1] 1339
# Diseases all - XKR4
bq_xkr4_dA <- system.file(
paste0( "extdata", .Platform$file.sep, "bq_xkr4_disease_all.tsv" ),
package="CTDquerier"
)
nrow( read.delim( bq_xkr4_dA, sep = "\t" ) )
## [1] 1340
What we can see from these files is that XKR4 has, according to CTDbase, 18 curated associations with chemicals, 1 curated association with diseases, 1339 inferred associations with diseases and 1340 association with diseases (including both curated and inferred). It must be said that these associations are not unique.
CTDquerier
The CTDquerier
allows to download the associated information to a single or a set of genes by ysing the function query_ctd_gene
:
library( CTDquerier )
xkr4 <- query_ctd_gene( terms = "XKR4", verbose = TRUE )
## Warning in .get_cache(): /home/biocbuild/.cache/CTDQuery
## Using temporary cache /tmp/Rtmp9H9Jhb/BiocFileCache
## Downloading GENE vocabulary from CTDbase
## Loading gene vocabulary.
## Warning in .get_cache(): /home/biocbuild/.cache/CTDQuery
## Using temporary cache /tmp/Rtmp9H9Jhb/BiocFileCache
## 1/tmp/Rtmp9H9Jhb/BiocFileCache/3e5a751f1027_CTD_genes.tsv.gz
## Warning in load_ctd_gene(): 1/tmp/Rtmp9H9Jhb/BiocFileCache/
## 3e5a751f1027_CTD_genes.tsv.gz
## 1/tmp/Rtmp9H9Jhb/BiocFileCache/3e5a751f1027_CTD_genes.tsv.gz
## Warning in load_ctd_gene(): 1/tmp/Rtmp9H9Jhb/BiocFileCache/
## 3e5a751f1027_CTD_genes.tsv.gz
## Staring query for gene 'XKR4' ( 114786 )
## . Downloading 'gene-gene interaction' table.
## . Downloading 'disease' table.
## . Downloading 'gene-chemical interaction' table.
## . Downloading 'GO terms' table.
## . Downloading 'KEGG pathways' table.
## . . No 'KEGG pathways' table available for XKR4' ( 114786 )
xkr4
## Object of class 'CTDdata'
## -------------------------
## . Type: GENE
## . Length: 1
## . Items: XKR4
## . Diseases: 786 ( NA / 786 )
## . Gene-gene interactions: 1 ( 1 )
## . Gene-chemical interactions: 19 ( 30 )
## . KEGG pathways: 0 (-)
## . GO terms: 4 ( 4 )
The query indicates that 25 gene-chemical interactions were downloaded from CTDbase. Takeing a close look to them we see that they corrsponds to the 18 chemicals obtained from Batch Query tool.
# How many unique chemicals associations there are in the result object?
xkr4_chem <- get_table( xkr4, index_name = "chemical interactions" )
length( unique( xkr4_chem$Chemical.Name ) )
## [1] 19
# How many of the chemicals download using CTDquerier are in the Batch Query files?
bq_xkr4_c <- read.delim( bq_xkr4_c, sep = "\t" )
sum( as.character( bq_xkr4_c[ , 2] ) %in% unique( xkr4_chem$Chemical.Name ) )
## [1] 18
On the side of disease associations, the retrieved data for XKR4 with CTDqurier
indicates that there are 762 gene-disease associations.
dim( get_table( xkr4, index_name = "diseases" ) )
## [1] 786 8
These 762 gene-disease assocations corresponds to the 1340 obtained from Batch Query one filtered by unique disease:
bq_xkr4_dA <- read.delim( bq_xkr4_dA, sep = "\t" )
length( unique( bq_xkr4_dA$DiseaseID ) )
## [1] 762
sum( as.character( unique( bq_xkr4_dA$DiseaseID ) ) %in%
get_table( xkr4, index_name = "diseases" )$Disease.ID )
## [1] 762
The diference in terms of numbers of associations between the results obtained from Batch Query and from CTDquerier
corresponds to the way the chemicals are nested in both tables. While in the results from Batch Query there is a row for each associations:
bq_xkr4_dA[1:3, ]
## X..Input DiseaseName DiseaseID GeneSymbol GeneID DiseaseCategories
## 1 xkr4 Abdominal Pain MESH:D015746 XKR4 114786 Signs and symptoms
## 2 xkr4 Abdominal Pain MESH:D015746 XKR4 114786 Signs and symptoms
## 3 xkr4 Abdominal Pain MESH:D015746 XKR4 114786 Signs and symptoms
## DiseaseCategories.1 DirectEvidence InferenceChemicalName InferenceScore
## 1 Signs and symptoms Propylthiouracil 10.25
## 2 Signs and symptoms Tretinoin 10.25
## 3 Signs and symptoms Valproic Acid 10.25
## OmimIDs PubMedIDs
## 1 NA 15822032|15879050
## 2 NA 9234591
## 3 NA 6206716
In the results from CTDquerier
there is a single entry for the disease instead one for each disease-chemical we see in the previous table from Batch Query. This is seen since in the results from CTDquerier
there is a single entry for Abdominal Pain and has the three chemicals in a single string
into the column Inference.Network
:
tbl <- get_table( xkr4, index_name = "diseases" )
tbl[ tbl$Disease.ID == "MESH:D015746", "Inference.Network" ]
## [1] "Propylthiouracil|Tretinoin|Valproic Acid"
## R version 3.5.1 Patched (2018-07-12 r74967)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.8-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.8-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] bindrcpp_0.2.2 CTDquerier_1.2.0 BiocStyle_2.10.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.19 pillar_1.3.0 compiler_3.5.1
## [4] BiocManager_1.30.3 dbplyr_1.2.2 bindr_0.1.1
## [7] bitops_1.0-6 tools_3.5.1 digest_0.6.18
## [10] bit_1.1-14 tibble_1.4.2 BiocFileCache_1.6.0
## [13] RSQLite_2.1.1 evaluate_0.12 memoise_1.1.0
## [16] pkgconfig_2.0.2 rlang_0.3.0.1 DBI_1.0.0
## [19] curl_3.2 yaml_2.2.0 parallel_3.5.1
## [22] xfun_0.4 httr_1.3.1 stringr_1.3.1
## [25] dplyr_0.7.7 knitr_1.20 S4Vectors_0.20.0
## [28] rappdirs_0.3.1 tidyselect_0.2.5 stats4_3.5.1
## [31] rprojroot_1.3-2 bit64_0.9-7 glue_1.3.0
## [34] R6_2.3.0 rmarkdown_1.10 bookdown_0.7
## [37] purrr_0.2.5 blob_1.1.1 magrittr_1.5
## [40] backports_1.1.2 htmltools_0.3.6 BiocGenerics_0.28.0
## [43] stringdist_0.9.5.1 assertthat_0.2.0 stringi_1.2.4
## [46] RCurl_1.95-4.11 crayon_1.3.4
1. Mattingly CJ FJ Colby GT. The comparative toxicogenomics database (ctd). 2003.
2. Davis AP JR Grondin CJ. The comparative toxicogenomics database: Update 2017. 2017.