biodbHmdb 1.12.0
biodbHmdb is a biodb extension package that implements a connector to HMDB Metabolites.
We present here the different ways to search for HMDB (Wishart et al. 2012) entries with this package.
Note that the whole HMDB is downloaded locally by biodb and stored on disk, since this is the only way to access HMDB programmatically. Any search on HMDB is hence currently run on the local machine.
Install using Bioconductor:
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install('biodbHmdb')
The first step in using biodbHmdb, is to create an instance of the biodb
class BiodbMain
from the main biodb package. This is done by calling the
constructor of the class:
mybiodb <- biodb::newInst()
During this step the configuration is set up, the cache system is initialized and extension packages are loaded.
We will see at the end of this vignette that the biodb instance needs to be
terminated with a call to the terminate()
method.
In biodb the connection to a database is handled by a connector instance that you can get from the factory. biodbHmdb implements a connector to a remote database. Here is the code to instantiate a connector:
conn <- mybiodb$getFactory()$createConn('hmdb.metabolites')
## Loading required package: biodbHmdb
For this vignette, we will avoid the downloading of the full HMDB Metabolites database, and use instead an extract containing a few entries:
dbExtract <- system.file("extdata", 'generated', "hmdb_extract.zip",
package="biodbHmdb")
conn$setPropValSlot('urls', 'db.zip.url', dbExtract)
To get the number of entries stored inside the database, run:
conn$getNbEntries()
## [1] 2
To get some of the first entry IDs (accession numbers) from the database, run:
ids <- conn$getEntryIds(2)
ids
## [1] "HMDB0000001" "HMDB0000002"
To retrieve entries, use:
entries <- conn$getEntry(ids)
entries
## [[1]]
## Biodb HMDB Metabolites entry instance HMDB0000001.
##
## [[2]]
## Biodb HMDB Metabolites entry instance HMDB0000002.
To convert a list of entries into a dataframe, run:
x <- mybiodb$entriesToDataframe(entries, compute=FALSE)
x
## accession
## 1 HMDB0000001
## 2 HMDB0000002
## secondary.accessions
## 1 HMDB00001;HMDB0004935;HMDB0006703;HMDB0006704;HMDB04935;HMDB06703;HMDB06704
## 2 HMDB00002;HMDB0060172;HMDB60172
## average.mass cas.id chebi.id chemspider.id kegg.compound.id
## 1 169.1811 332-80-9 50599 83153 C01152
## 2 74.1249 109-76-2 15725 415 C00986
## ncbi.pubchem.comp.id comp.iupac.name.syst
## 1 92105 (2S)-2-amino-3-(1-methyl-1H-imidazol-4-yl)propanoic acid
## 2 428 propane-1,3-diamine
## comp.iupac.name.trad formula
## 1 1 methylhistidine C7H11N3O2
## 2 α,ω-propanediamine C3H10N2
## inchi
## 1 InChI=1S/C7H11N3O2/c1-10-3-5(9-4-10)2-6(8)7(11)12/h3-4,6H,2,8H2,1H3,(H,11,12)/t6-/m0/s1
## 2 InChI=1S/C3H10N2/c4-2-1-3-5/h1-5H2
## inchikey monoisotopic.mass
## 1 BRMWTNUJHUMWMS-LURJTMIESA-N 169.0851
## 2 XFNJVJPLKCPIBV-UHFFFAOYSA-N 74.0844
## name
## 1 1-Methylhistidine;(2S)-2-Amino-3-(1-methyl-1H-imidazol-4-yl)propanoic acid;Pi-methylhistidine;(2S)-2-Amino-3-(1-methyl-1H-imidazol-4-yl)propanoate;1 Methylhistidine;1-Methyl histidine;1-Methyl-histidine;1-Methyl-L-histidine;1-MHis;1-N-Methyl-L-histidine;L-1-Methylhistidine;N1-Methyl-L-histidine;1-Methylhistidine dihydrochloride;Renal disease;Nephropathy;Non-insulin-dependent diabetes mellitus;Niddm;Adult-onset diabetes;Striated muscle;Fecal;Stool;Faecal;Faeces;Csf;Cytoplasma
## 2 1,3-Diaminopropane;1,3-Propanediamine;1,3-Propylenediamine;Propane-1,3-diamine;tn;Trimethylenediamine;1,3-diamino-N-Propane;1,3-Trimethylenediamine;3-Aminopropylamine;a,W-Propanediamine;Trimethylenediamine hydrochloride;Trimethylenediamine dihydrochloride;1,3-Diaminepropane;Leukaemia;Digestion;Flora;Gramineae;Papilionoideae;Legume;Soy;Soya;Soybean;Soya bean;Cucurbits;Gourds;Fauna;Fecal;Stool;Faecal;Faeces;Cytoplasma
## description
## 1 1-Methylhistidine, also known as 1-MHis, belongs to the class of organic compounds known as histidine and derivatives. Histidine and derivatives are compounds containing cysteine or a derivative thereof resulting from a reaction of cysteine at the amino group or the carboxy group, or from the replacement of any hydrogen of glycine by a heteroatom. 1-Methylhistidine is derived mainly from the anserine of dietary flesh sources, especially poultry. The enzyme, carnosinase, splits anserine into beta-alanine and 1-MHis. High levels of 1-MHis tend to inhibit the enzyme carnosinase and increase anserine levels. Conversely, genetic variants with deficient carnosinase activity in plasma show increased 1-MHis excretions when they consume a high meat diet. Reduced serum carnosinase activity is also found in patients with Parkinson's disease and multiple sclerosis and patients following a cerebrovascular accident. Vitamin E deficiency can lead to 1-methylhistidinuria from increased oxidative effects in skeletal muscle. 1-Methylhistidine is a biomarker for the consumption of meat, especially red meat.
## 2 1,3-Diaminopropane, also known as DAP or trimethylenediamine, belongs to the class of organic compounds known as monoalkylamines. These are organic compounds containing a primary aliphatic amine group. 1,3-Diaminopropane is a stable, flammable, and highly hygroscopic fluid. It is a polyamine that is normally quite toxic if swallowed, inhaled, or absorbed through the skin. It is a catabolic byproduct of spermidine. It is also a precursor in the enzymatic synthesis of beta-alanine. 1,3-Diaminopropane is involved in the arginine/proline metabolic pathways and the beta-alanine metabolic pathway. 1,3-Diaminopropane has been detected, but not quantified in, several different foods, such as cassava, shiitakes, oyster mushrooms, muscadine grapes, and cinnamons. This could make 1,3-diaminopropane a potential biomarker for the consumption of these foods.
## smiles comp.super.class hmdb.metabolites.id
## 1 CN1C=NC(C[C@H](N)C(O)=O)=C1 Organic acids and derivatives HMDB0000001
## 2 NCCCN Organic nitrogen compounds HMDB0000002
We use here the generic biodb method searchForEntries()
to search for
entries by name:
id <- conn$searchForEntries(list(name='1-Methylhistidine'), max.results=1)
id
## [1] "HMDB0000001"
We limit the search result to one entry with the max.results
field.
The first parameter is the filtering criterion, expressed as a list whose
single key (in our case) is the biodb field on which we want to filter.
The value is the text we want to search for.
See the documentation of searchForEntries()
inside ?biodb::BiodbConn
.
We could also use several strings to search for, in which case an entry will be matched if its field value contains all the specified strings:
conn$searchForEntries(list(name=c('propanoic', 'acid')), max.results=1)
## [1] "HMDB0000001"
To look at the values of the entry, you may convert it to a data frame:
entryDf <- conn$getEntry(id)$getFieldsAsDataframe(fields=c('accession', 'name'))
See table 1 for the content of this data frame.
accession | name |
---|---|
HMDB0000001 | 1-Methylhistidine;(2S)-2-Amino-3-(1-methyl-1H-imidazol-4-yl)propanoic acid;Pi-methylhistidine;(2S)-2-Amino-3-(1-methyl-1H-imidazol-4-yl)propanoate;1 Methylhistidine;1-Methyl histidine;1-Methyl-histidine;1-Methyl-L-histidine;1-MHis;1-N-Methyl-L-histidine;L-1-Methylhistidine;N1-Methyl-L-histidine;1-Methylhistidine dihydrochloride;Renal disease;Nephropathy;Non-insulin-dependent diabetes mellitus;Niddm;Adult-onset diabetes;Striated muscle;Fecal;Stool;Faecal;Faeces;Csf;Cytoplasma |
Searching inside the description
field can be done in the same way as for the
name
field.
Here is a search with multiple strings to match:
id <- conn$searchForEntries(list(description=c('Parkinson', 'sclerosis')), max.results=1)
id
## [1] "HMDB0000001"
Again, you can look at the values of the entry through a data frame:
entryDf <- conn$getEntry(id)$getFieldsAsDataframe(fields=c('accession', 'name', 'description'))
See table 2 for the content of this data frame.
accession | name | description |
---|---|---|
HMDB0000001 | 1-Methylhistidine;(2S)-2-Amino-3-(1-methyl-1H-imidazol-4-yl)propanoic acid;Pi-methylhistidine;(2S)-2-Amino-3-(1-methyl-1H-imidazol-4-yl)propanoate;1 Methylhistidine;1-Methyl histidine;1-Methyl-histidine;1-Methyl-L-histidine;1-MHis;1-N-Methyl-L-histidine;L-1-Methylhistidine;N1-Methyl-L-histidine;1-Methylhistidine dihydrochloride;Renal disease;Nephropathy;Non-insulin-dependent diabetes mellitus;Niddm;Adult-onset diabetes;Striated muscle;Fecal;Stool;Faecal;Faeces;Csf;Cytoplasma | 1-Methylhistidine, also known as 1-MHis, belongs to the class of organic compounds known as histidine and derivatives. Histidine and derivatives are compounds containing cysteine or a derivative thereof resulting from a reaction of cysteine at the amino group or the carboxy group, or from the replacement of any hydrogen of glycine by a heteroatom. 1-Methylhistidine is derived mainly from the anserine of dietary flesh sources, especially poultry. The enzyme, carnosinase, splits anserine into beta-alanine and 1-MHis. High levels of 1-MHis tend to inhibit the enzyme carnosinase and increase anserine levels. Conversely, genetic variants with deficient carnosinase activity in plasma show increased 1-MHis excretions when they consume a high meat diet. Reduced serum carnosinase activity is also found in patients with Parkinson’s disease and multiple sclerosis and patients following a cerebrovascular accident. Vitamin E deficiency can lead to 1-methylhistidinuria from increased oxidative effects in skeletal muscle. 1-Methylhistidine is a biomarker for the consumption of meat, especially red meat. |
When done with your biodb instance you have to terminate it, in order to ensure release of resources (file handles, database connection, etc):
mybiodb$terminate()
## INFO [19:43:57.829] Closing BiodbMain instance...
## INFO [19:43:57.837] Connector "hmdb.metabolites" deleted.
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] biodbHmdb_1.12.0 BiocStyle_2.34.0
##
## loaded via a namespace (and not attached):
## [1] rappdirs_0.3.3 sass_0.4.9 utf8_1.2.4
## [4] generics_0.1.3 stringi_1.8.4 RSQLite_2.3.7
## [7] hms_1.1.3 digest_0.6.37 magrittr_2.0.3
## [10] evaluate_1.0.1 bookdown_0.41 fastmap_1.2.0
## [13] blob_1.2.4 plyr_1.8.9 jsonlite_1.8.9
## [16] progress_1.2.3 DBI_1.2.3 BiocManager_1.30.25
## [19] httr_1.4.7 fansi_1.0.6 XML_3.99-0.17
## [22] jquerylib_0.1.4 cli_3.6.3 rlang_1.1.4
## [25] chk_0.9.2 crayon_1.5.3 dbplyr_2.5.0
## [28] bit64_4.5.2 withr_3.0.2 cachem_1.1.0
## [31] yaml_2.3.10 tools_4.4.1 memoise_2.0.1
## [34] biodb_1.14.0 dplyr_1.1.4 filelock_1.0.3
## [37] curl_5.2.3 vctrs_0.6.5 R6_2.5.1
## [40] BiocFileCache_2.14.0 lifecycle_1.0.4 stringr_1.5.1
## [43] bit_4.5.0 pkgconfig_2.0.3 pillar_1.9.0
## [46] bslib_0.8.0 glue_1.8.0 Rcpp_1.0.13
## [49] lgr_0.4.4 xfun_0.48 tibble_3.2.1
## [52] tidyselect_1.2.1 knitr_1.48 htmltools_0.5.8.1
## [55] rmarkdown_2.28 compiler_4.4.1 prettyunits_1.2.0
## [58] askpass_1.2.1 openssl_2.2.2
Wishart, David S., Timothy Jewison, An Chi Guo, Michael Wilson, Craig Knox, Yifeng Liu, Yannick Djoumbou, et al. 2012. “HMDB 3.0—The Human Metabolome Database in 2013.” Nucleic Acids Research 41 (D1): D801–D807. https://doi.org/10.1093/nar/gks1065.