Uniparental disomy (UPD) is a rare genetic condition where an individual inherits both copies of a chromosome from one parent, as opposed to the typical inheritance of one copy from each parent. The extent of its contribution as a causative factor in rare genetic diseases remains largely unexplored. UPDs can lead to disease either by inheriting a carrier pathogenic mutation as homozygous from a carrier parent or by causing errors in genetic imprinting. Nevertheless, there are currently no standardized methods available for the detection and characterization of these events.
We have developed UPDhmm
R/Bioconductor package. The package provides
a tool method to detect, classify and stablish the location
of uniparental disomy events.
“UPDhmm
relies on a Hidden Markov Model (HMM) to identify regions with UPD.
In our HMM model, observations are the combination of the genotypes from the
father, mother and proband for every genomic variant in the input data. The
hidden states of the model represent five inheritance patterns: normal
(mendelian inheritance), maternal isodisomy, paternal isodisomy, maternal
heterodisomy and paternal heterodisomy. Emission probabilities were defined
based on the inheritance patterns.
Viterbi algorithm was employed to infer the most likely combination of hidden
states underlying a sequence of observations, thus defining the most likely
inheritance pattern for each genetic variant. UPDhmm
reports segments in the
genome having an inheritance pattern different than normal, and thus, likely
harbouring a UPD.”
library(UPDhmm)
library(dplyr)
The input of the package is a multisample vcf file with the following requirements:
The UPDhmm package includes one example dataset, adapted from GIB This dataset serves as a practical illustration and can be utilized for testing and familiarizing users with the functionality of the package.
After reading the VCF file, the vcfCheck()
function is employed for
preprocessing the input data. This function facilitates reading the VCF in the
suitable format for the UPDhmm package.
library(UPDhmm)
library(VariantAnnotation)
file <- system.file(package = "UPDhmm", "extdata", "test_het_mat.vcf.gz")
vcf <- readVcf(file)
processedVcf <- vcfCheck(
vcf,
proband = "NA19675", mother = "NA19678", father = "NA19679"
)
The principal function of UPDhmm
package, calculateEvents()
, is the central
function for identifying Uniparental Disomy (UPD) events. It takes the output
from the previous vcfCheck()
function and systematically analyzes genomic
data, splitting the VCF into chromosomes and applying the Viterbi algorithm.
calculateEvents()
function encompasses a serie of subprocesses for
identifying Uniparental disomies (UPDs):
(1) Split vcf into chromosomes
(2) Apply Viterbi algorithm to every chromosome
(3) Transform the inferred hidden states to coordinates data.frame
(4) Create blocks of contiguous variants with the same state
(5) Calculate the statistics (log-likelihood and p-value).
updEvents <- calculateEvents(largeCollapsedVcf = processedVcf)
head(updEvents)
seqnames start end group n_snps log_likelihood p_value
1 3 100374740 197574936 het_mat 47 75.21933 4.212224e-18
2 6 32489853 32490000 iso_mat 6 38.63453 5.110683e-10
3 6 32609207 32609341 iso_mat 11 95.23543 1.690378e-22
4 11 55322539 55587117 iso_mat 7 40.02082 2.512703e-10
5 14 105408955 105945891 het_fat 30 20.84986 4.967274e-06
6 15 22382655 23000272 iso_mat 12 48.33859 3.586255e-12
n_mendelian_error
1 2
2 5
3 7
4 3
5 2
6 3
The calculateEvents
function returns a data.frame
containing all
detected events in the provided trio. If no events are found, the function
will return an empty data.frame.
Column name | Description |
---|---|
seqnames |
Chromosome |
start |
Start position of the block |
end |
End position of the block |
n_snps |
Number of variants within the event |
group |
Predicted state |
log_likelihood |
log likelihood ratio |
p_value |
p-value |
n_mendelian_error |
Count of Mendelian errors (genotypic combinations that are inconsistent with Mendelian inheritance principles) |
To visualize the results, the karyoploteR
package can be used. Here,
a custom function is provided to facilitate easy implementation with the
output format results. This function enables the visualization of the detected
blocks by the calculateEvents
function.
In this example, we can observe how the package detects the simulated event on chromosome 3, as well as the specific type of event. In the plot, the autosomes chromosomes are represented, and we can see that the entire q arm of chromosome 3 is colored. With the legend, we can identify that the event is a maternal heterodisomy.
library(karyoploteR)
library(regioneR)
plotUPDKp <- function(updEvents) {
updEvents$seqnames <- paste0("chr", updEvents$seqnames)
het_fat <- toGRanges(subset(
updEvents,
group == "het_fat"
)[, c("seqnames", "start", "end")])
het_mat <- toGRanges(subset(
updEvents,
group == "het_mat"
)[, c("seqnames", "start", "end")])
iso_fat <- toGRanges(subset(
updEvents,
group == "iso_fat"
)[, c("seqnames", "start", "end")])
iso_mat <- toGRanges(subset(
updEvents,
group == "iso_mat"
)[, c("seqnames", "start", "end")])
kp <- plotKaryotype(genome = "hg19")
kpPlotRegions(kp, het_fat, col = "#AAF593")
kpPlotRegions(kp, het_mat, col = "#FFB6C1")
kpPlotRegions(kp, iso_fat, col = "#A6E5FC")
kpPlotRegions(kp, iso_mat, col = "#E9B864")
colors <- c("#AAF593", "#FFB6C1", "#A6E5FC", "#E9B864")
legend("topright",
legend = c("Het_Fat", "Het_Mat", "Iso_Fat", "Iso_Mat"),
fill = colors
)
}
plotUPDKp(updEvents)
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS
Matrix products: default
BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] karyoploteR_1.32.0 regioneR_1.38.0
[3] VariantAnnotation_1.52.0 Rsamtools_2.22.0
[5] Biostrings_2.74.0 XVector_0.46.0
[7] SummarizedExperiment_1.36.0 Biobase_2.66.0
[9] GenomicRanges_1.58.0 GenomeInfoDb_1.42.0
[11] IRanges_2.40.0 S4Vectors_0.44.0
[13] MatrixGenerics_1.18.0 matrixStats_1.4.1
[15] BiocGenerics_0.52.0 dplyr_1.1.4
[17] UPDhmm_1.2.0
loaded via a namespace (and not attached):
[1] DBI_1.2.3 bitops_1.0-9 gridExtra_2.3
[4] rlang_1.1.4 magrittr_2.0.3 biovizBase_1.54.0
[7] compiler_4.4.1 RSQLite_2.3.7 GenomicFeatures_1.58.0
[10] png_0.1-8 vctrs_0.6.5 ProtGenerics_1.38.0
[13] stringr_1.5.1 pkgconfig_2.0.3 crayon_1.5.3
[16] fastmap_1.2.0 backports_1.5.0 utf8_1.2.4
[19] rmarkdown_2.28 UCSC.utils_1.2.0 bit_4.5.0
[22] xfun_0.48 zlibbioc_1.52.0 cachem_1.1.0
[25] jsonlite_1.8.9 blob_1.2.4 highr_0.11
[28] DelayedArray_0.32.0 BiocParallel_1.40.0 parallel_4.4.1
[31] cluster_2.1.6 R6_2.5.1 stringi_1.8.4
[34] RColorBrewer_1.1-3 bezier_1.1.2 rtracklayer_1.66.0
[37] rpart_4.1.23 Rcpp_1.0.13 knitr_1.48
[40] base64enc_0.1-3 Matrix_1.7-1 nnet_7.3-19
[43] tidyselect_1.2.1 rstudioapi_0.17.1 dichromat_2.0-0.1
[46] abind_1.4-8 yaml_2.3.10 codetools_0.2-20
[49] curl_5.2.3 lattice_0.22-6 tibble_3.2.1
[52] KEGGREST_1.46.0 evaluate_1.0.1 foreign_0.8-87
[55] pillar_1.9.0 BiocManager_1.30.25 checkmate_2.3.2
[58] generics_0.1.3 RCurl_1.98-1.16 ensembldb_2.30.0
[61] ggplot2_3.5.1 munsell_0.5.1 scales_1.3.0
[64] BiocStyle_2.34.0 glue_1.8.0 lazyeval_0.2.2
[67] Hmisc_5.2-0 tools_4.4.1 BiocIO_1.16.0
[70] data.table_1.16.2 BSgenome_1.74.0 GenomicAlignments_1.42.0
[73] XML_3.99-0.17 grid_4.4.1 AnnotationDbi_1.68.0
[76] colorspace_2.1-1 GenomeInfoDbData_1.2.13 htmlTable_2.4.3
[79] restfulr_0.0.15 Formula_1.2-5 cli_3.6.3
[82] HMM_1.0.1 fansi_1.0.6 S4Arrays_1.6.0
[85] AnnotationFilter_1.30.0 gtable_0.3.6 digest_0.6.37
[88] SparseArray_1.6.0 rjson_0.2.23 htmlwidgets_1.6.4
[91] memoise_2.0.1 htmltools_0.5.8.1 lifecycle_1.0.4
[94] httr_1.4.7 bit64_4.5.2 bamsignals_1.38.0