Importing Data

In order to use this package, your data must be converted from the output of methylation calling software to a tabix indexed bgzipped format. The data needs to be sorted by genomic position to respect the requirements of the samtools tabix indexing tool. On Linux and macOS systems this is done using the bash sort utility, which is memory efficient, but on Windows this is done by loading the entire table and sorting within R.

Data example

The conversion can be done using the create_tabix_file() function. We provide example data of nanopolish output within the package, we can look inside to see how the data looks coming out of nanopolish

methy_calls <- system.file(package = "NanoMethViz",
    c("sample1_nanopolish.tsv.gz", "sample2_nanopolish.tsv.gz"))

# have a look at the first 10 rows of methy_data
methy_calls_example <- read.table(
    methy_calls[1], sep = "\t", header = TRUE, nrows = 6)

methy_calls_example

##   chromosome strand     start       end                            read_name
## 1       chr1      - 127732476 127732476 e648c4e3-ca6a-4671-af17-86dab4c819eb
## 2      chr11      - 115423144 115423144 726dd8b5-1531-4279-9cf0-a7e4d5ea0478
## 3      chr11      +  69150806  69150814 34f9ee3e-4b27-4d2d-a203-4067f0662044
## 4       chr1      + 170484965 170484965 d8309c06-375f-4dfe-b22e-0c47af888cd9
## 5       chrY      -   4082060   4082060 f68940f6-4236-4f0f-9af7-a81b5c2911b6
## 6       chr8      + 120733312 120733312 13ae181f-b88b-4d6c-a815-553ff2e25312
##   log_lik_ratio log_lik_methylated log_lik_unmethylated num_calling_strands
## 1         -5.91            -100.38               -94.47                   1
## 2         -8.07            -115.21              -107.13                   1
## 3         -1.65            -183.12              -181.47                   1
## 4          2.74            -112.14              -114.88                   1
## 5         -1.78            -135.09              -133.32                   1
## 6          5.02            -129.31              -134.33                   1
##   num_motifs            sequence
## 1          1         CATTACGTTTC
## 2          1         AACTTCGTTGA
## 3          2 GGTCACGGGAATCCGGTTC
## 4          1         AGAAGCGCTAA
## 5          1         CTCACCGTATA
## 6          1         TCTGACGTTGA

We then create a temporary path to store a converted file, this will be deleted once you exit your R session. Once create_tabix_file() is run, it will create a .bgz file along with its tabix index. Because we have a small amount of data, we can read in a small portion of it for inspection, do not do this with large datasets as it decompresses all the data and will take very long to run.

Megalodon Data

To import data from Megalodon’s modification calls, the per-read modified bases file must be generated. This can be done by either adding --write-mods-text argument to Megalodon run or using the megalodon_extras per_read_text modified_bases utility.

Importing data

methy_tabix <- file.path(tempdir(), "methy_data.bgz")
samples <- c("sample1", "sample2")

# you should see messages when running this yourself
create_tabix_file(methy_calls, methy_tabix, samples)

# don't do this with actual data
# we have to use gzfile to tell R that we have a gzip compressed file
methy_data <- read.table(
    gzfile(methy_tabix), col.names = methy_col_names(), nrows = 6)

methy_data

##    sample  chr      pos strand statistic                            read_name
## 1 sample2 chr1  5141050      -      6.93 3818f2e2-d520-4305-bbab-efad891f67f2
## 2 sample1 chr1  6283067      -      1.05 36e3c55f-c41f-4bd6-b371-54368d013008
## 3 sample1 chr1  7975278      -      1.39 6f6cbc59-af4c-4dfa-8e48-ef4ac4eeb13b
## 4 sample1 chr1 10230292      -      2.19 fbe53b38-e264-4c7a-824e-2651c22f8ea6
## 5 sample1 chr1 13127127      -      2.51 7660ba1f-9b44-4783-b901-ed79b2f0481b
## 6 sample1 chr1 13127134      -      2.51 7660ba1f-9b44-4783-b901-ed79b2f0481b

Now methy_tabix will be the path to a tabix object that is ready for use with NanoMethViz. Please head over to the “Introduction” vignette to see how to use this data for visualisation!

Importing Data

Shian Su

Data example

Megalodon Data

Importing data