1 Introduction

Recent studies associated the differences of cell-type proportions may be correlated to certain phenotypes, such as cancer. Therefore, the demand for the development of computational methods to predict cell type proportions increased. Hereby, we developed deconvR, a collection of functions designed for analyzing deconvolution of the bulk sample(s) using an atlas of reference omic signature profiles and a user-selected model. We wanted to give users an option to extend their reference atlas. Users can create new reference atlases using findSignatures or extend their atlas by adding more cell types. Additionally, we included BSMeth2Probe to make mapping whole-genome bisulfite sequencing data to their probe IDs easier. So users can map WGBS methylation data (as in methylKit or GRanges object format) to probe IDs, and the results of this mapping can be used as the bulk samples in the deconvolution. We also included a comprehensive DNA methylation atlas of 25 different cell types to use in the main function deconvolute. deconvolute allows the user to specify their desired deconvolution model (non-negative least squares regression, support vector regression, quadratic programming, or robust linear regression), and returns a dataframe which contains predicted cell-type proportions of bulk methylation profiles, as well as partial R-squared values for each sample.

As an another option, users can generate a simulated table of a desired number of samples, with either user-specified or random origin proportions using simulateCellMix. simulateCellMix returns a second data frame called proportions, which contains the actual cell-type proportions of the simulated sample. It can be used for testing the accuracy of the deconvolution by comparing these actual proportions to the proportions predicted by deconvolute.

deconvolute returns partial R-squares, to check if deconvolution brings advantages on top of the basic bimodal profiles. The reference matrix usually follows a bimodal distribution in the case of methylation, and taking the average of the rows of methylation matrix might give a pretty similar profile to the bulk methylation profile you are trying to deconvolute. If the deconvolution is advantageous, partial R-squared is expect to be high.

2 Installation

The deconvR package can be installed from Bioconductor with:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("deconvR")

3 Data

3.1 Comprehensive Human Methylome Reference Atlas

The comprehensive human methylome reference atlas created by Moss et al. 1 Moss, J. et al. (2018). Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease. Nature communications, 9(1), 1-12. https://doi.org/10.1038/s41467-018-07466-6 can be used as the reference atlas parameter for several functions in this package. This atlas was modified to remove duplicate CpG loci before being included in the package as the dataframe. The dataframe is composed of 25 human cell types and roughly 6000 CpG loci, identified by their Illumina Probe ID. For each cell type and CpG locus, a methylation value between 0 and 1 is provided. This value represents the fraction of methylated bases of the CpG locus. The atlas therefore provides a unique methylation pattern for each cell type and can be directly used as reference in deconvolute and simulateCellMix, and atlas in findSignatures. Below is an example dataframe to illustrate the atlas format.

library(deconvR) 

data("HumanCellTypeMethAtlas")
head(HumanCellTypeMethAtlas[,1:5])
##          IDs Monocytes_EPIC B.cells_EPIC CD4T.cells_EPIC NK.cells_EPIC
## 1 cg08169020         0.8866       0.2615          0.0149        0.0777
## 2 cg25913761         0.8363       0.2210          0.2816        0.4705
## 3 cg26955540         0.7658       0.0222          0.1492        0.4005
## 4 cg25170017         0.8861       0.5116          0.1021        0.4363
## 5 cg12827637         0.5212       0.3614          0.0227        0.2120
## 6 cg19442545         0.2013       0.1137          0.0608        0.0410

3.2 Illumina Infinium MethylationEPIC v1.0 B5 Manifest Probes (hg38)

The GRanges object IlluminaMethEpicB5ProbeIDs contains the Illumina probe IDs of 400000 genomic loci (identified using the “seqnames”, “ranges”, and “strand” values). This object is based off of the Infinium MethylationEPIC v1.0 B5 Manifest data. Unnecessary columns were removed and rows were truncated to reduce file size before converting the file to a GRanges object. It can be used directly as probe_id_locations in BSmeth2Probe.

data("IlluminaMethEpicB5ProbeIDs")
head(IlluminaMethEpicB5ProbeIDs)
## GRanges object with 6 ranges and 1 metadata column:
##       seqnames              ranges strand |          ID
##          <Rle>           <IRanges>  <Rle> | <character>
##   [1]    chr19     5236005-5236006      + |  cg07881041
##   [2]    chr20   63216298-63216299      + |  cg18478105
##   [3]     chr1     6781065-6781066      + |  cg23229610
##   [4]     chr2 197438742-197438743      - |  cg03513874
##   [5]     chrX   24054523-24054524      + |  cg09835024
##   [6]    chr14   93114794-93114795      - |  cg05451842
##   -------
##   seqinfo: 25 sequences from an unspecified genome; no seqlengths

4 Example Workflow For Whole Genome Bisulfate Sequencing Data

4.1 Expanding Reference Atlas

As mentioned in the introduction section, users can extend their reference atlas to incorporate new data. Or may combine different reference atlases to construct a more comprehensive one. This can be done using the findSignatures function. In this example, since we don’t have any additional reference atlas, we will add simulated data as a new cell type to reference atlas for example purposes. First, ensure that atlas (the signature matrix to be extended) and samples (the new data to be added to the signature matrix) are compliant with the function requirements. Below illustrates the samples format.

samples <- simulateCellMix(3,reference = HumanCellTypeMethAtlas)$simulated
head(samples)
##          IDs  Sample 1  Sample 2   Sample 3
## 1 cg08169020 0.0431364 0.2269164 0.10438891
## 2 cg25913761 0.2184536 0.2936846 0.25213488
## 3 cg26955540 0.0935334 0.2086873 0.14478635
## 4 cg25170017 0.0619150 0.2791843 0.15638519
## 5 cg12827637 0.1223568 0.1587402 0.11029666
## 6 cg19442545 0.0163416 0.0567639 0.03550804

sampleMeta should include all sample names in samples, and specify the origins they should be mapped to when added to atlas.

sampleMeta <- data.table("Experiment_accession" = colnames(samples)[-1],
                         "Biosample_term_name" = "new cell type")
head(sampleMeta)
##    Experiment_accession Biosample_term_name
##                  <char>              <char>
## 1:             Sample 1       new cell type
## 2:             Sample 2       new cell type
## 3:             Sample 3       new cell type

Use findSignatures to extend the matrix.

extended_matrix <- findSignatures(samples = samples, 
                                 sampleMeta = sampleMeta, 
                                 atlas = HumanCellTypeMethAtlas,
                                 IDs = "IDs")
## CELL TYPES IN EXTENDED ATLAS: 
## new cell type 
## Monocytes_EPIC 
## B.cells_EPIC 
## CD4T.cells_EPIC 
## NK.cells_EPIC 
## CD8T.cells_EPIC 
## Neutrophils_EPIC 
## Erythrocyte_progenitors 
## Adipocytes 
## Cortical_neurons 
## Hepatocytes 
## Lung_cells 
## Pancreatic_beta_cells 
## Pancreatic_acinar_cells 
## Pancreatic_duct_cells 
## Vascular_endothelial_cells 
## Colon_epithelial_cells 
## Left_atrium 
## Bladder 
## Breast 
## Head_and_neck_larynx 
## Kidney 
## Prostate 
## Thyroid 
## Upper_GI 
## Uterus_cervix
head(extended_matrix)
##          IDs new_cell_type Monocytes_EPIC B.cells_EPIC CD4T.cells_EPIC
## 1 cg08169020    0.12481389         0.8866       0.2615          0.0149
## 2 cg25913761    0.25475770         0.8363       0.2210          0.2816
## 3 cg26955540    0.14900235         0.7658       0.0222          0.1492
## 4 cg25170017    0.16582817         0.8861       0.5116          0.1021
## 5 cg12827637    0.13046454         0.5212       0.3614          0.0227
## 6 cg19442545    0.03620451         0.2013       0.1137          0.0608
##   NK.cells_EPIC CD8T.cells_EPIC Neutrophils_EPIC Erythrocyte_progenitors
## 1        0.0777          0.0164           0.8680                  0.9509
## 2        0.4705          0.3961           0.8293                  0.2385
## 3        0.4005          0.3474           0.7915                  0.1374
## 4        0.4363          0.0875           0.7042                  0.9447
## 5        0.2120          0.0225           0.5368                  0.4667
## 6        0.0410          0.0668           0.1952                  0.1601
##   Adipocytes Cortical_neurons Hepatocytes Lung_cells Pancreatic_beta_cells
## 1     0.0336           0.0168      0.0340     0.0416              0.038875
## 2     0.3578           0.3104      0.2389     0.2250              0.132000
## 3     0.1965           0.0978      0.0338     0.0768              0.041725
## 4     0.0842           0.2832      0.2259     0.0544              0.111750
## 5     0.0287           0.1368      0.0307     0.1607              0.065975
## 6     0.0364           0.0222      0.1574     0.0122              0.003825
##   Pancreatic_acinar_cells Pancreatic_duct_cells Vascular_endothelial_cells
## 1                  0.0209                0.0130                     0.0323
## 2                  0.2249                0.1996                     0.3654
## 3                  0.0314                0.0139                     0.2382
## 4                  0.0309                0.0217                     0.0972
## 5                  0.0370                0.0230                     0.0798
## 6                  0.0378                0.0347                     0.0470
##   Colon_epithelial_cells Left_atrium Bladder Breast Head_and_neck_larynx Kidney
## 1                 0.0163      0.0386  0.0462 0.0264               0.0470 0.0269
## 2                 0.2037      0.2446  0.2054 0.1922               0.2045 0.1596
## 3                 0.0193      0.1134  0.1269 0.1651               0.1523 0.1034
## 4                 0.0187      0.0674  0.0769 0.0691               0.0704 0.0604
## 5                 0.0193      0.0432  0.0459 0.0228               0.0687 0.0234
## 6                 0.0193      0.0287  0.0246 0.0081               0.0098 0.0309
##   Prostate Thyroid Upper_GI Uterus_cervix
## 1   0.0353  0.0553   0.0701        0.0344
## 2   0.1557  0.1848   0.1680        0.2026
## 3   0.0686  0.0943   0.1298        0.1075
## 4   0.0369  0.0412   0.0924        0.0697
## 5   0.0508  0.0726   0.0759        0.0196
## 6   0.0055  0.0188   0.0090        0.0166

WGBS methylation data first needs to be mapped to probes using BSmeth2Probe before being deconvoluted. The methylation data WGBS_data in BSmeth2Probe may be either a GRanges object or a methylKit object.

Format of WGBS_data as GRanges object:

load(system.file("extdata", "WGBS_GRanges.rda",
                                     package = "deconvR"))
head(WGBS_GRanges)
## GRanges object with 6 ranges and 1 metadata column:
##       seqnames    ranges strand |   sample1
##          <Rle> <IRanges>  <Rle> | <numeric>
##   [1]     chr1     47176      + |  0.833333
##   [2]     chr1     47177      - |  0.818182
##   [3]     chr1     47205      + |  0.681818
##   [4]     chr1     47206      - |  0.583333
##   [5]     chr1     47362      + |  0.416667
##   [6]     chr1     49271      + |  0.733333
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

or as methylKit object:

head(methylKit::methRead(system.file("extdata", "test1.myCpG.txt", 
                                     package = "methylKit"), 
                         sample.id="test", assembly="hg18", 
                         treatment=1, context="CpG", mincov = 0))
##     chr   start     end strand coverage numCs numTs
## 1 chr21 9764513 9764513      -       12     0    12
## 2 chr21 9764539 9764539      -       12     3     9
## 3 chr21 9820622 9820622      +       13     0    13
## 4 chr21 9837545 9837545      +       11     0    11
## 5 chr21 9849022 9849022      +      124    90    34
## 6 chr21 9853296 9853296      +       17    10     7

probe_id_locations contains information needed to map cellular loci to probe IDs

data("IlluminaMethEpicB5ProbeIDs")
head(IlluminaMethEpicB5ProbeIDs)
## GRanges object with 6 ranges and 1 metadata column:
##       seqnames              ranges strand |          ID
##          <Rle>           <IRanges>  <Rle> | <character>
##   [1]    chr19     5236005-5236006      + |  cg07881041
##   [2]    chr20   63216298-63216299      + |  cg18478105
##   [3]     chr1     6781065-6781066      + |  cg23229610
##   [4]     chr2 197438742-197438743      - |  cg03513874
##   [5]     chrX   24054523-24054524      + |  cg09835024
##   [6]    chr14   93114794-93114795      - |  cg05451842
##   -------
##   seqinfo: 25 sequences from an unspecified genome; no seqlengths

Use BSmeth2Probe to map WGBS data to probe IDs.

mapped_WGBS_data <- BSmeth2Probe(probe_id_locations = IlluminaMethEpicB5ProbeIDs, 
                                 WGBS_data = WGBS_GRanges,
                                 multipleMapping = TRUE,
                                 cutoff = 10)
head(mapped_WGBS_data)
##          IDs   sample1
## 1 cg00305774 1.0000000
## 2 cg00546078 0.8181818
## 3 cg00546307 0.5454545
## 4 cg00546971 0.5000000
## 5 cg00774867 0.8461538
## 6 cg00830435 0.9166667

This mapped data can now be used in deconvolute. Here we will deconvolute it using the previously extended signature matrix as the reference atlas.

deconvolution <- deconvolute(reference = HumanCellTypeMethAtlas, 
                             bulk = mapped_WGBS_data)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9963  0.9963  0.9963  0.9963  0.9963  0.9963
deconvolution$proportions
##         Monocytes_EPIC B.cells_EPIC CD4T.cells_EPIC NK.cells_EPIC
## sample1              0            0               0             0
##         CD8T.cells_EPIC Neutrophils_EPIC Erythrocyte_progenitors Adipocytes
## sample1               0                0                       0  0.5011789
##         Cortical_neurons Hepatocytes Lung_cells Pancreatic_beta_cells
## sample1        0.2917357           0          0                     0
##         Pancreatic_acinar_cells Pancreatic_duct_cells
## sample1                       0                     0
##         Vascular_endothelial_cells Colon_epithelial_cells Left_atrium Bladder
## sample1                          0            0.001012524           0       0
##         Breast Head_and_neck_larynx Kidney Prostate Thyroid Upper_GI
## sample1      0            0.2060729      0        0       0        0
##         Uterus_cervix
## sample1             0

4.2 Constructing tissue specific CpG signature matrix

Alternatively, users can set tissueSpecCpGs as TRUE to construct tissue based methylation signature matrix by using the reference atlas. Here, we used simulated samples to construct tissue specific signature matrix since we don’t have tissue specific samples.

data("HumanCellTypeMethAtlas")
exampleSamples <- simulateCellMix(1,reference = HumanCellTypeMethAtlas)$simulated
exampleMeta <- data.table("Experiment_accession" = "example_sample",
                          "Biosample_term_name" = "example_cell_type")
colnames(exampleSamples) <- c("CpGs", "example_sample")
colnames(HumanCellTypeMethAtlas)[1] <- c("CpGs")

signatures <- findSignatures(
  samples = exampleSamples,
  sampleMeta = exampleMeta,
  atlas = HumanCellTypeMethAtlas,
  IDs = "CpGs", K = 100, tissueSpecCpGs = TRUE)
## CELL TYPES IN EXTENDED ATLAS: 
## example_cell_type 
## Monocytes_EPIC 
## B.cells_EPIC 
## CD4T.cells_EPIC 
## NK.cells_EPIC 
## CD8T.cells_EPIC 
## Neutrophils_EPIC 
## Erythrocyte_progenitors 
## Adipocytes 
## Cortical_neurons 
## Hepatocytes 
## Lung_cells 
## Pancreatic_beta_cells 
## Pancreatic_acinar_cells 
## Pancreatic_duct_cells 
## Vascular_endothelial_cells 
## Colon_epithelial_cells 
## Left_atrium 
## Bladder 
## Breast 
## Head_and_neck_larynx 
## Kidney 
## Prostate 
## Thyroid 
## Upper_GI 
## Uterus_cervix
print(head(signatures[[2]]))
## $hyper
## cg08169020 cg26955540 cg25170017 cg12827637 cg15838173 cg04858631 cg19442545 
## 0.22338188 0.16855778 0.16104420 0.15804442 0.14506709 0.14388430 0.14138040 
## cg10560079 cg00982136 cg26460530 cg22017733 cg13677741 cg01424562 cg24082121 
## 0.13843754 0.13349591 0.12745647 0.12549185 0.12254703 0.12016754 0.12000360 
## cg09642825 cg15376996 cg14789659 cg11895835 cg06340704 cg05507654 cg19300307 
## 0.11965228 0.11526537 0.11470340 0.11298549 0.11275413 0.11235319 0.11232673 
## cg12474798 cg14785464 cg27189395 cg25913761 cg26033520 cg18856478 cg24275356 
## 0.11221055 0.11210741 0.11208867 0.11206473 0.11147062 0.11067015 0.11027369 
## cg23220823 cg08857351 cg00936790 cg20684110 cg10509607 cg26757820 cg05056497 
## 0.11008164 0.10995840 0.10972635 0.10960602 0.10917454 0.10897608 0.10886508 
## cg08474651 cg27395200 cg05258935 cg06167719 cg14057303 cg22897141 cg03254916 
## 0.10864348 0.10856998 0.10789424 0.10784143 0.10776217 0.10752754 0.10732893 
## cg23247274 cg09267773 cg14855367 cg04913246 cg16416715 cg22879098 cg26953232 
## 0.10721950 0.10714356 0.10686506 0.10685934 0.10667075 0.10660076 0.10614420 
## cg10361922 cg26047334 cg08063160 cg08257293 cg02962602 cg25158622 cg08425810 
## 0.10586413 0.10580519 0.10569091 0.10558466 0.10548360 0.10538325 0.10489848 
## cg11975790 cg03711944 cg08832851 cg05043794 cg27388962 cg16278496 cg22138735 
## 0.10471929 0.10443732 0.10436841 0.10395375 0.10384045 0.10272622 0.10245963 
## cg14845962 cg04055490 cg17847861 cg16127573 cg09577804 cg18675610 cg27312961 
## 0.10215285 0.10181838 0.10176296 0.10163856 0.10144838 0.10138791 0.10131960 
## cg08367326 cg15014975 cg19331221 cg08846870 cg03063309 cg01726273 cg07218880 
## 0.10116402 0.10071127 0.10060543 0.10015174 0.09979522 0.09973667 0.09964025 
## cg07865091 cg00769843 cg13638257 cg01377358 cg16402452 cg06398251 cg06889535 
## 0.09954861 0.09954179 0.09948695 0.09913928 0.09913203 0.09912561 0.09908088 
## cg16829306 cg04462378 cg24591770 cg20074395 cg11532431 cg10661769 cg19692929 
## 0.09898439 0.09898277 0.09894628 0.09883819 0.09880249 0.09866337 0.09864417 
## cg07438103 cg02297709 cg11025609 cg16988611 cg21181453 cg08659179 cg15185986 
## 0.09859585 0.09846172 0.09843703 0.09834226 0.09763527 0.09754829 0.09721971 
## cg15575375 cg20809067 
## 0.09718114 0.09712848 
## 
## $hypo
## cg03663120 cg20942286 cg03126713 cg03963853 cg00828556 cg24788483 cg11186858 
##  0.3759395  0.2983940  0.2720098  0.2706075  0.2700396  0.2695410  0.2673300 
## cg22528270 cg13931640 cg05612654 cg15310871 cg13500029 cg15871206 cg10480329 
##  0.2646031  0.2644911  0.2632022  0.2600270  0.2590017  0.2577089  0.2505648 
## cg11231069 cg12655112 cg03549146 cg12458039 cg05923857 cg20610950 cg07737292 
##  0.2492501  0.2394918  0.2391437  0.2390970  0.2369206  0.2347220  0.2332453 
## cg10456459 cg03313271 cg06988336 cg23952578 cg16636767 cg06517984 cg11597902 
##  0.2290243  0.2244608  0.2219031  0.2152414  0.2152290  0.2144171  0.2120307 
## cg27334271 cg14976569 cg11327657 cg06297318 cg04851465 cg22259797 cg15633603 
##  0.2118326  0.2114394  0.2108312  0.2101803  0.2100339  0.2076416  0.2075013 
## cg08425796 cg13403369 cg18990407 cg04354689 cg06373940 cg22185977 cg06125903 
##  0.2068291  0.2058475  0.2052697  0.2033874  0.2032562  0.2027773  0.2023323 
## cg20107506 cg07033722 cg10718056 cg09322573 cg12866960 cg25517015 cg26538782 
##  0.2017578  0.2013592  0.2010267  0.2004806  0.1992974  0.1990322  0.1986189 
## cg27366072 cg03310874 cg01879591 cg02796279 cg13093111 cg19502671 cg06978145 
##  0.1983931  0.1980183  0.1968446  0.1964198  0.1956971  0.1949187  0.1945433 
## cg08538581 cg19380303 cg20429104 cg10967114 cg23403750 cg26783127 cg11153071 
##  0.1941764  0.1936788  0.1935501  0.1935422  0.1930137  0.1927536  0.1926248 
## cg06721411 cg01024962 cg23755933 cg05445326 cg06585734 cg20966357 cg26923863 
##  0.1921887  0.1913396  0.1912708  0.1906347  0.1904134  0.1902425  0.1900587 
## cg26305504 cg12555086 cg07918933 cg04217515 cg27340480 cg04836151 cg04664897 
##  0.1898573  0.1897987  0.1894145  0.1891411  0.1890733  0.1890176  0.1889488 
## cg11802666 cg02244028 cg23334433 cg26889953 cg18835472 cg00009088 cg26298914 
##  0.1887986  0.1884375  0.1882457  0.1882376  0.1878145  0.1874927  0.1873197 
## cg17117981 cg25596405 cg08400494 cg13980609 cg14189391 cg17086773 cg22662844 
##  0.1867530  0.1865549  0.1864200  0.1862970  0.1853721  0.1853588  0.1850977 
## cg15288326 cg08428868 cg24250902 cg04586126 cg22875823 cg00235484 cg22666015 
##  0.1849950  0.1848289  0.1846174  0.1845621  0.1841353  0.1825761  0.1824446 
## cg16429499 cg15878616 
##  0.1821681  0.1816098

4.3 Constructing tissue specific DMPs

Alternatively, users can set tissueSpecDMPs as TRUE to obtain tissue based DMPs by using the reference atlas. Here, we used simulated samples since we don’t have tissue specific samples. Note that both tissueSpecCpGs and tissueSpecDMPs can’t be TRUE at the same time.

data("HumanCellTypeMethAtlas")
exampleSamples <- simulateCellMix(1,reference = HumanCellTypeMethAtlas)$simulated
exampleMeta <- data.table("Experiment_accession" = "example_sample",
                          "Biosample_term_name" = "example_cell_type")
colnames(exampleSamples) <- c("CpGs", "example_sample")
colnames(HumanCellTypeMethAtlas)[1] <- c("CpGs")

signatures <- findSignatures(
  samples = exampleSamples,
  sampleMeta = exampleMeta,
  atlas = HumanCellTypeMethAtlas,
  IDs = "CpGs", tissueSpecDMPs = TRUE)
## CELL TYPES IN EXTENDED ATLAS: 
## example_cell_type 
## Monocytes_EPIC 
## B.cells_EPIC 
## CD4T.cells_EPIC 
## NK.cells_EPIC 
## CD8T.cells_EPIC 
## Neutrophils_EPIC 
## Erythrocyte_progenitors 
## Adipocytes 
## Cortical_neurons 
## Hepatocytes 
## Lung_cells 
## Pancreatic_beta_cells 
## Pancreatic_acinar_cells 
## Pancreatic_duct_cells 
## Vascular_endothelial_cells 
## Colon_epithelial_cells 
## Left_atrium 
## Bladder 
## Breast 
## Head_and_neck_larynx 
## Kidney 
## Prostate 
## Thyroid 
## Upper_GI 
## Uterus_cervix
print(head(signatures[[2]]))
##            intercept        f         pval         qval
## cg10480329 -3.287511 177.8860 1.360899e-12 8.203499e-09
## cg06297318 -3.496336 157.5137 4.911533e-12 1.480336e-08
## cg18835472 -3.350244 144.3110 1.222267e-11 2.455942e-08
## cg27366072 -2.858145 133.7777 2.667956e-11 3.412020e-08
## cg05923857 -3.515739 133.0089 2.830143e-11 3.412020e-08
## cg15633603 -1.496293 125.5673 5.089159e-11 5.112909e-08

5 Example Workflow For RNA Sequencing Data

It is possible to use RNA-seq data for deconvolution via deconvR package. Beware that you have to set IDs column that contains Gene names to run deconvR functions. Therefore you can simulate bulk RNA-seq data via simulateCellMix and, extend RNA-seq reference atlas via findSignatures.

6 sessionInfo

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] dplyr_1.1.4       doParallel_1.0.17 iterators_1.0.14  foreach_1.5.2    
## [5] deconvR_1.12.0    data.table_1.16.2 knitr_1.48        BiocStyle_2.34.0 
## 
## loaded via a namespace (and not attached):
##   [1] splines_4.4.1               BiocIO_1.16.0              
##   [3] bitops_1.0-9                tibble_3.2.1               
##   [5] R.oo_1.26.0                 preprocessCore_1.68.0      
##   [7] XML_3.99-0.17               lifecycle_1.0.4            
##   [9] lattice_0.22-6              MASS_7.3-61                
##  [11] base64_2.0.2                scrime_1.3.5               
##  [13] magrittr_2.0.3              minfi_1.52.0               
##  [15] limma_3.62.0                sass_0.4.9                 
##  [17] rmarkdown_2.28              jquerylib_0.1.4            
##  [19] yaml_2.3.10                 robslopes_1.1.3            
##  [21] doRNG_1.8.6                 askpass_1.2.1              
##  [23] minqa_1.2.8                 DBI_1.2.3                  
##  [25] RColorBrewer_1.1-3          abind_1.4-8                
##  [27] zlibbioc_1.52.0             quadprog_1.5-8             
##  [29] GenomicRanges_1.58.0        purrr_1.0.2                
##  [31] R.utils_2.12.3              BiocGenerics_0.52.0        
##  [33] RCurl_1.98-1.16             GenomeInfoDbData_1.2.13    
##  [35] IRanges_2.40.0              S4Vectors_0.44.0           
##  [37] rentrez_1.2.3               genefilter_1.88.0          
##  [39] annotate_1.84.0             DelayedMatrixStats_1.28.0  
##  [41] codetools_0.2-20            DelayedArray_0.32.0        
##  [43] xml2_1.3.6                  tidyselect_1.2.1           
##  [45] UCSC.utils_1.2.0            lme4_1.1-35.5              
##  [47] beanplot_1.3.1              matrixStats_1.4.1          
##  [49] stats4_4.4.1                illuminaio_0.48.0          
##  [51] GenomicAlignments_1.42.0    jsonlite_1.8.9             
##  [53] multtest_2.62.0             e1071_1.7-16               
##  [55] survival_3.7-0              bbmle_1.0.25.1             
##  [57] tools_4.4.1                 rsq_2.7                    
##  [59] Rcpp_1.0.13                 glue_1.8.0                 
##  [61] SparseArray_1.6.0           xfun_0.48                  
##  [63] qvalue_2.38.0               MatrixGenerics_1.18.0      
##  [65] GenomeInfoDb_1.42.0         HDF5Array_1.34.0           
##  [67] numDeriv_2016.8-1.1         BiocManager_1.30.25        
##  [69] fastmap_1.2.0               boot_1.3-31                
##  [71] rhdf5filters_1.18.0         fansi_1.0.6                
##  [73] openssl_2.2.2               digest_0.6.37              
##  [75] R6_2.5.1                    colorspace_2.1-1           
##  [77] gtools_3.9.5                RSQLite_2.3.7              
##  [79] R.methodsS3_1.8.2           utf8_1.2.4                 
##  [81] tidyr_1.3.1                 generics_0.1.3             
##  [83] rtracklayer_1.66.0          class_7.3-22               
##  [85] httr_1.4.7                  S4Arrays_1.6.0             
##  [87] pkgconfig_2.0.3             gtable_0.3.6               
##  [89] blob_1.2.4                  siggenes_1.80.0            
##  [91] XVector_0.46.0              htmltools_0.5.8.1          
##  [93] bookdown_0.41               scales_1.3.0               
##  [95] Biobase_2.66.0              png_0.1-8                  
##  [97] deming_1.4-1                tzdb_0.4.0                 
##  [99] reshape2_1.4.4              rjson_0.2.23               
## [101] nloptr_2.1.1                coda_0.19-4.1              
## [103] nlme_3.1-166                curl_5.2.3                 
## [105] bdsmatrix_1.3-7             bumphunter_1.48.0          
## [107] proxy_0.4-27                cachem_1.1.0               
## [109] rhdf5_2.50.0                stringr_1.5.1              
## [111] AnnotationDbi_1.68.0        restfulr_0.0.15            
## [113] GEOquery_2.74.0             pillar_1.9.0               
## [115] grid_4.4.1                  reshape_0.8.9              
## [117] vctrs_0.6.5                 Deriv_4.1.6                
## [119] xtable_1.8-4                evaluate_1.0.1             
## [121] readr_2.1.5                 GenomicFeatures_1.58.0     
## [123] mvtnorm_1.3-1               cli_3.6.3                  
## [125] locfit_1.5-9.10             compiler_4.4.1             
## [127] Rsamtools_2.22.0            rlang_1.1.4                
## [129] crayon_1.5.3                rngtools_1.5.2             
## [131] nor1mix_1.3-3               mclust_6.1.1               
## [133] emdbook_1.3.13              plyr_1.8.9                 
## [135] stringi_1.8.4               mcr_1.3.3.1                
## [137] BiocParallel_1.40.0         nnls_1.6                   
## [139] assertthat_0.2.1            munsell_0.5.1              
## [141] Biostrings_2.74.0           Matrix_1.7-1               
## [143] hms_1.1.3                   sparseMatrixStats_1.18.0   
## [145] bit64_4.5.2                 ggplot2_3.5.1              
## [147] Rhdf5lib_1.28.0             KEGGREST_1.46.0            
## [149] methylKit_1.32.0            statmod_1.5.0              
## [151] SummarizedExperiment_1.36.0 fastseg_1.52.0             
## [153] memoise_2.0.1               bslib_0.8.0                
## [155] bit_4.5.0
stopCluster(cl)