UCSCRepeatMasker 3.15.2
The UCSCRepeatMasker
package provides metadata for
AnnotationHub resources associated with UCSC RepeatMasker
annotations. The original data can be found through UCSC download URLs
https://hgdownload.soe.ucsc.edu/goldenPath/XXXX/database/rmsk.txt.gz
,
where XXXX
is the corresponding code to a UCSC genome version.
Details about how those original data were processed into
AnnotationHub resources can be found in the source
file:
UCSCRepeatMasker/scripts/make-data_UCSCRepeatMasker.R
while details on how the metadata for those resources has been generated can be found in the source file:
UCSCRepeatMasker/scripts/make-metadata_UCSCRepeatMasker.R
UCSC RepeatMasker annotations can be retrieved using the AnnotationHub, which is a web resource that provides a central location where genomic files (e.g., VCF, bed, wig) and other resources from standard (e.g., UCSC, Ensembl) and distributed sites, can be found. A Bioconductor AnnotationHub web resource creates and manages a local cache of files retrieved by the user, helping with quick and reproducible access.
For example, to list the available UCSC RepeatMasker annotations for the human genome, we should first load the AnnotationHub package:
library(AnnotationHub)
and then query the annotation hub as follows:
ah <- AnnotationHub()
query(ah, c("RepeatMasker", "Homo sapiens"))
## AnnotationHub with 2 records
## # snapshotDate(): 2022-01-31
## # $dataprovider: UCSC
## # $species: Homo sapiens
## # $rdataclass: GRanges
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["AH99002"]]'
##
## title
## AH99002 | UCSC RepeatMasker annotations (Mar2020) for Human (hg19)
## AH99003 | UCSC RepeatMasker annotations (Sep2021) for Human (hg38)
We can retrieve the desired resource, e.g., UCSC RepeatMasker annotations for hg38, using the following syntax:
rmskhg38 <- ah[["AH99003"]]
rmskhg38
## GRanges object with 5633664 ranges and 4 metadata columns:
## seqnames ranges strand | swScore repName
## <Rle> <IRanges> <Rle> | <integer> <character>
## [1] chr1 10001-10468 + | 463 (TAACCC)n
## [2] chr1 15798-15849 + | 18 (TGCTCC)n
## [3] chr1 16713-16744 + | 18 (TGG)n
## [4] chr1 18907-19048 + | 239 L2a
## [5] chr1 19972-20405 + | 994 L3
## ... ... ... ... . ... ...
## [5633660] chrX_KV766199v1_alt 179150-179234 - | 255 MIR1_Amn
## [5633661] chrX_KV766199v1_alt 184474-184785 - | 2039 AluJb
## [5633662] chrX_KV766199v1_alt 186964-187271 - | 386 MLT1G3
## [5633663] chrX_KV766199v1_alt 187486-187569 - | 270 MLT1G3
## [5633664] chrX_KV766199v1_alt 187597-187822 - | 1301 L1MA8
## repClass repFamily
## <character> <character>
## [1] Simple_repeat Simple_repeat
## [2] Simple_repeat Simple_repeat
## [3] Simple_repeat Simple_repeat
## [4] LINE L2
## [5] LINE CR1
## ... ... ...
## [5633660] SINE MIR
## [5633661] SINE Alu
## [5633662] LTR ERVL-MaLR
## [5633663] LTR ERVL-MaLR
## [5633664] LINE L1
## -------
## seqinfo: 640 sequences (1 circular) from hg38 genome
Note that the data is returned using a GRanges
object, please consult the
vignettes from the GenomicRanges package for details on how to
manipulate this type of object. The contents of the 4 metadata columns are
described at the UCSC Genome Browser web page for the
RepeatMasker database schema.
Please consult the credits and references sections on that page for information
on how to cite these data.
The GRanges
object contains further metadata accessible with the metadata()
method as follows:
metadata(rmskhg38)
## $srcurl
## [1] "https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/rmsk.txt.gz"
##
## $srcVersion
## [1] "Sep2021"
##
## $citation
## A. Smit, R. Hubley, P. Green (1996-2010). _RepeatMasker Open-3.0_.
## <URL: https://www.repeatmasker.org>.
##
## $gdesc
## | organism: Homo sapiens (Human)
## | genome: hg38
## | provider: UCSC
## | release date: Jun. 2013
## | ---
## | seqlengths:
## | chr1 chr2 ... chrX_KV766199v1_alt
## | 248956422 242193529 ... 188004
sessionInfo()
## R Under development (unstable) (2022-01-05 r81451)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] GenomicRanges_1.47.6 GenomeInfoDb_1.31.4 IRanges_2.29.1
## [4] S4Vectors_0.33.10 AnnotationHub_3.3.8 BiocFileCache_2.3.4
## [7] dbplyr_2.1.1 BiocGenerics_0.41.2 BiocStyle_2.23.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.8 png_0.1-7
## [3] Biostrings_2.63.1 assertthat_0.2.1
## [5] digest_0.6.29 utf8_1.2.2
## [7] mime_0.12 R6_2.5.1
## [9] RSQLite_2.2.9 evaluate_0.14
## [11] httr_1.4.2 pillar_1.7.0
## [13] zlibbioc_1.41.0 rlang_1.0.1
## [15] curl_4.3.2 jquerylib_0.1.4
## [17] blob_1.2.2 rmarkdown_2.11
## [19] stringr_1.4.0 RCurl_1.98-1.6
## [21] bit_4.0.4 shiny_1.7.1
## [23] compiler_4.2.0 httpuv_1.6.5
## [25] xfun_0.29 pkgconfig_2.0.3
## [27] htmltools_0.5.2 tidyselect_1.1.1
## [29] KEGGREST_1.35.0 GenomeInfoDbData_1.2.7
## [31] tibble_3.1.6 interactiveDisplayBase_1.33.0
## [33] bookdown_0.24 fansi_1.0.2
## [35] withr_2.4.3 crayon_1.4.2
## [37] dplyr_1.0.8 later_1.3.0
## [39] bitops_1.0-7 rappdirs_0.3.3
## [41] jsonlite_1.7.3 xtable_1.8-4
## [43] lifecycle_1.0.1 DBI_1.1.2
## [45] magrittr_2.0.2 cli_3.1.1
## [47] stringi_1.7.6 cachem_1.0.6
## [49] XVector_0.35.0 promises_1.2.0.1
## [51] bslib_0.3.1 ellipsis_0.3.2
## [53] filelock_1.0.2 generics_0.1.2
## [55] vctrs_0.3.8 tools_4.2.0
## [57] bit64_4.0.5 Biobase_2.55.0
## [59] glue_1.6.1 purrr_0.3.4
## [61] BiocVersion_3.15.0 fastmap_1.1.0
## [63] yaml_2.2.2 AnnotationDbi_1.57.1
## [65] BiocManager_1.30.16 memoise_2.0.1
## [67] knitr_1.37 sass_0.4.0