library(NanoMethViz)
In order to use this package, your data must be converted from the output of methylation calling software to a tabix indexed bgzipped format. The data needs to be sorted by genomic position to respect the requirements of the samtools tabix indexing tool. On Linux and macOS systems this is done using the bash sort
utility, which is memory efficient, but on Windows this is done by loading the entire table and sorting within R.
We currently support output from
The conversion can be done using the create_tabix_file()
function. We provide example data of nanopolish output within the package, we can look inside to see how the data looks coming out of nanopolish
methy_calls <- system.file(package = "NanoMethViz",
c("sample1_nanopolish.tsv.gz", "sample2_nanopolish.tsv.gz"))
# have a look at the first 10 rows of methy_data
methy_calls_example <- read.table(
methy_calls[1], sep = "\t", header = TRUE, nrows = 6)
methy_calls_example
## chromosome strand start end read_name
## 1 chr1 - 127732476 127732476 e648c4e3-ca6a-4671-af17-86dab4c819eb
## 2 chr11 - 115423144 115423144 726dd8b5-1531-4279-9cf0-a7e4d5ea0478
## 3 chr11 + 69150806 69150814 34f9ee3e-4b27-4d2d-a203-4067f0662044
## 4 chr1 + 170484965 170484965 d8309c06-375f-4dfe-b22e-0c47af888cd9
## 5 chrY - 4082060 4082060 f68940f6-4236-4f0f-9af7-a81b5c2911b6
## 6 chr8 + 120733312 120733312 13ae181f-b88b-4d6c-a815-553ff2e25312
## log_lik_ratio log_lik_methylated log_lik_unmethylated num_calling_strands
## 1 -5.91 -100.38 -94.47 1
## 2 -8.07 -115.21 -107.13 1
## 3 -1.65 -183.12 -181.47 1
## 4 2.74 -112.14 -114.88 1
## 5 -1.78 -135.09 -133.32 1
## 6 5.02 -129.31 -134.33 1
## num_motifs sequence
## 1 1 CATTACGTTTC
## 2 1 AACTTCGTTGA
## 3 2 GGTCACGGGAATCCGGTTC
## 4 1 AGAAGCGCTAA
## 5 1 CTCACCGTATA
## 6 1 TCTGACGTTGA
We then create a temporary path to store a converted file, this will be deleted once you exit your R session. Once create_tabix_file()
is run, it will create a .bgz file along with its tabix index. Because we have a small amount of data, we can read in a small portion of it for inspection, do not do this with large datasets as it decompresses all the data and will take very long to run.
To import data from Megalodon’s modification calls, the per-read modified bases file must be generated. This can be done by either adding --write-mods-text
argument to Megalodon run or using the megalodon_extras per_read_text modified_bases
utility.
methy_tabix <- file.path(tempdir(), "methy_data.bgz")
samples <- c("sample1", "sample2")
# you should see messages when running this yourself
create_tabix_file(methy_calls, methy_tabix, samples)
# don't do this with actual data
# we have to use gzfile to tell R that we have a gzip compressed file
methy_data <- read.table(
gzfile(methy_tabix), col.names = methy_col_names(), nrows = 6)
methy_data
## sample chr pos strand statistic read_name
## 1 sample2 chr1 5141050 - 6.93 3818f2e2-d520-4305-bbab-efad891f67f2
## 2 sample1 chr1 6283067 - 1.05 36e3c55f-c41f-4bd6-b371-54368d013008
## 3 sample1 chr1 7975278 - 1.39 6f6cbc59-af4c-4dfa-8e48-ef4ac4eeb13b
## 4 sample1 chr1 10230292 - 2.19 fbe53b38-e264-4c7a-824e-2651c22f8ea6
## 5 sample1 chr1 13127127 - 2.51 7660ba1f-9b44-4783-b901-ed79b2f0481b
## 6 sample1 chr1 13127134 - 2.51 7660ba1f-9b44-4783-b901-ed79b2f0481b
Now methy_tabix
will be the path to a tabix object that is ready for use with NanoMethViz. Please head over to the “Introduction” vignette to see how to use this data for visualisation!