Title: | Clustering Single-Cell Multimodal Omics Data with Joint Graph Regularized Single-Cell Kullback-Leibler Sparse Non-Negative Matrix Factorization |
---|---|
Description: | Methods to perform Joint graph Regularized Single-Cell Kullback-Leibler Sparse Non-negative Matrix Factorization (jrSiCKLSNMF, pronounced "junior sickles NMF") on quality controlled multi-assay single-cell omics count data, specifically dual-assay scRNA-seq and scATAC-seq data. 'jrSiCKLSNMF' extracts meaningful latent factors that are shared across omics views. These factors enable accurate cell-type clustering, and facilitate visualizations. Also includes methods for mini- batch updates and other adaptations for larger datasets. |
Authors: | Dorothy Ellis [aut, cre] |
Maintainer: | Dorothy Ellis <[email protected]> |
License: | GPL-3 + file LICENSE |
Version: | 2.0.0 |
Built: | 2025-02-18 02:42:40 UTC |
Source: | https://github.com/ellisdoro/jrsicklsnmf |
Add any type of metadata to an object of class SickleJr. Metadata
are stored in list format under the name specified in metadataname
of each node in slot metadata
.
AddSickleJrMetadata(SickleJr, metadata, metadataname)
AddSickleJrMetadata(SickleJr, metadata, metadataname)
SickleJr |
An object of class SickleJr holding at least one count matrix of omics data |
metadata |
Metadata to add to the SickleJr object; there are no restrictions on type |
metadataname |
A string input that indicates the desired name for the added metadata. |
An object of class SickleJr with added metadata
SimSickleJrSmall<-AddSickleJrMetadata(SimSickleJrSmall, SimData$cell_type,"cell_types_full_data")
SimSickleJrSmall<-AddSickleJrMetadata(SimSickleJrSmall, SimData$cell_type,"cell_types_full_data")
Generate graph Laplacians for graph regularization of
jrSiCKLSNMF from the list of raw count matrices using a KNN graph. Note that this
is only appropriate when the number of features is considerably greater
than the number of cells in all modalities. If this is not the case, please use
BuildSNNGraphLaplacians
or any other method of graph
construction that does not rely on the Euclidean distance and store the graph Laplacians for
each modality as a list in the graph.laplacian.list
slot.
BuildKNNGraphLaplacians(SickleJr, k = 20)
BuildKNNGraphLaplacians(SickleJr, k = 20)
SickleJr |
An object of class SickleJr |
k |
Number of KNN neighbors to calculate; by default, is set to 20 |
An object of class SickleJr with a list of graph Laplacians in sparse matrix format
added to the graph.laplacian.list
slot
SimSickleJrSmall<-BuildKNNGraphLaplacians(SimSickleJrSmall)
SimSickleJrSmall<-BuildKNNGraphLaplacians(SimSickleJrSmall)
Generate graph Laplacians for graph regularization of jrSiCKLSNMF from the list of raw count matrices using an SNN graph. SNN is more robust to situations where the number of cells outnumbers the number of features.
BuildSNNGraphLaplacians(SickleJr, k = 20)
BuildSNNGraphLaplacians(SickleJr, k = 20)
SickleJr |
An object of class SickleJr |
k |
Number of KNN neighbors to calculate SNN graph; defaults to 20 |
An object of class SickleJr with list of graph Laplacians in sparse
matrix format added to its graph.laplacian.list
slot
SimSickleJrSmall<-BuildSNNGraphLaplacians(SimSickleJrSmall)
SimSickleJrSmall<-BuildSNNGraphLaplacians(SimSickleJrSmall)
Perform UMAP on the matrix alone (default) or within a modality by
using UMAP on the
corresponding to modality
.
CalculateUMAPSickleJr( SickleJr, umap.settings = umap::umap.defaults, modality = NULL )
CalculateUMAPSickleJr( SickleJr, umap.settings = umap::umap.defaults, modality = NULL )
SickleJr |
An object of class SickleJr |
umap.settings |
Optional settings for UMAP; defaults to |
modality |
A number corresponding to the desired modality; if set, will perform UMAP on
|
An object of class SickleJr with UMAP output based on the matrix alone or within a modality added to its
umap
slot
#Since this example has only 10 observations, #we need to modify the number of neighbors from the default of 15 umap.settings=umap::umap.defaults umap.settings$n_neighbors=2 SimSickleJrSmall<-CalculateUMAPSickleJr(SimSickleJrSmall, umap.settings=umap.settings) SimSickleJrSmall<-CalculateUMAPSickleJr(SimSickleJrSmall, umap.settings=umap.settings,modality=1) SimSickleJrSmall<-CalculateUMAPSickleJr(SimSickleJrSmall, umap.settings=umap.settings,modality=2)
#Since this example has only 10 observations, #we need to modify the number of neighbors from the default of 15 umap.settings=umap::umap.defaults umap.settings$n_neighbors=2 SimSickleJrSmall<-CalculateUMAPSickleJr(SimSickleJrSmall, umap.settings=umap.settings) SimSickleJrSmall<-CalculateUMAPSickleJr(SimSickleJrSmall, umap.settings=umap.settings,modality=1) SimSickleJrSmall<-CalculateUMAPSickleJr(SimSickleJrSmall, umap.settings=umap.settings,modality=2)
matrixPerform k-means, spectral clustering, clustering based off of the
index of the maximum latent factor, or Louvain community detection on the matrix.
Defaults to k-means.
ClusterSickleJr( SickleJr, numclusts, method = "kmeans", neighbors = 20, louvainres = 0.3 )
ClusterSickleJr( SickleJr, numclusts, method = "kmeans", neighbors = 20, louvainres = 0.3 )
SickleJr |
An object of class SickleJr |
numclusts |
Number of clusters; can be NULL when method is "max" or "louvain" |
method |
String holding the clustering method: can choose "kmeans" for k-means clustering, "spectral" for spectral clustering, "louvain" for Louvain community detection or "max" for clustering based on the maximum row value; note that "max" is only appropriate for jrSiCKLSNMF with L2 norm row regularization |
neighbors |
Number indicating the number of neighbors to use to generate the graphs for spectral clustering and Louvain community detection: both of these methods require the construction of a graph first (here we use KNN); defaults to 20 and unused when the clustering method equal to "kmeans" or "max" |
louvainres |
Numeric containing the resolution parameter for Louvain community detection; unused for all other methods |
SickleJr- an object of class SickleJr with added clustering information
SimSickleJrSmall<-ClusterSickleJr(SimSickleJrSmall,3) SimSickleJrSmall<-ClusterSickleJr(SimSickleJrSmall,method="louvain",neighbors=5) SimSickleJrSmall<-ClusterSickleJr(SimSickleJrSmall,method="spectral",neighbors=5,numclusts=3) #DO NOT DO THIS FOR REAL DATA; this is just to illustrate max clustering SimSickleJrSmall<-SetLambdasandRegs(SimSickleJrSmall,HrowReg="L2Norm") SimSickleJrSmall<-ClusterSickleJr(SimSickleJrSmall,method="max")
SimSickleJrSmall<-ClusterSickleJr(SimSickleJrSmall,3) SimSickleJrSmall<-ClusterSickleJr(SimSickleJrSmall,method="louvain",neighbors=5) SimSickleJrSmall<-ClusterSickleJr(SimSickleJrSmall,method="spectral",neighbors=5,numclusts=3) #DO NOT DO THIS FOR REAL DATA; this is just to illustrate max clustering SimSickleJrSmall<-SetLambdasandRegs(SimSickleJrSmall,HrowReg="L2Norm") SimSickleJrSmall<-ClusterSickleJr(SimSickleJrSmall,method="max")
Using a list of sparse count matrices, create an object of class SickleJr and specify the names of these count matrices.
CreateSickleJr(count.matrices, names = NULL)
CreateSickleJr(count.matrices, names = NULL)
count.matrices |
A list of quality-controlled count matrices with pre-filtered features where each modality corresponds to each matrix in the list |
names |
Optional parameter with names for the count matrices in vector format |
An object of class SickleJr with sparse count matrices added to the count.matrices
slot
ExampleSickleJr<-CreateSickleJr(SimData$Xmatrices)
ExampleSickleJr<-CreateSickleJr(SimData$Xmatrices)
A wrapper for the clValid
and fviz_nbclust
functions to perform clustering diagnostics
DetermineClusters( SickleJr, numclusts = 2:20, clusteringmethod = "kmeans", diagnosticmethods = c("wss", "silhouette", "gap_stat"), clValidvalidation = "internal", createDiagnosticplots = TRUE, runclValidDiagnostics = TRUE, printPlots = TRUE, printclValid = TRUE, subset = FALSE, subsetsize = 1000, seed = NULL )
DetermineClusters( SickleJr, numclusts = 2:20, clusteringmethod = "kmeans", diagnosticmethods = c("wss", "silhouette", "gap_stat"), clValidvalidation = "internal", createDiagnosticplots = TRUE, runclValidDiagnostics = TRUE, printPlots = TRUE, printclValid = TRUE, subset = FALSE, subsetsize = 1000, seed = NULL )
SickleJr |
An object of class SickleJr |
numclusts |
A vector of integers indicating the number of clusters to test |
clusteringmethod |
String holding the clustering method: defaults to k-means; since the other methods are not implemented in jrSiCKLSNMF, it is recommended to use k-means. |
diagnosticmethods |
Vector of strings indicating which methods to plot. Defaults to all three of the available: wss, silhouette, and gap_stat |
clValidvalidation |
String containing validation method to use for clValid. Defaults to internal. |
createDiagnosticplots |
Boolean indicating whether to create diagnostic plots for cluster size |
runclValidDiagnostics |
Boolean indicating whether to calculate the diagnostics from |
printPlots |
Boolean indicating whether to print the diagnostic plots |
printclValid |
Boolean indicating whether to print the diagnostic results from clValid |
subset |
Boolean indicating whether to calculate the diagnostics on a subset of the data rather than on the whole dataset. |
subsetsize |
Numeric value indicating size of the subset |
seed |
Numeric value holding the random seed |
An object of class SickleJr with cluster diagnostics added to its clusterdiagnostics
slot
#Since these data are too small, the clValid diagnostics do not run #properly. See the vignette for an example with the clValid diagnostics SimSickleJrSmall<-DetermineClusters(SimSickleJrSmall,numclusts=2:5,runclValidDiagnostics=FALSE)
#Since these data are too small, the clValid diagnostics do not run #properly. See the vignette for an example with the clValid diagnostics SimSickleJrSmall<-DetermineClusters(SimSickleJrSmall,numclusts=2:5,runclValidDiagnostics=FALSE)
This generates v+1 plots, where v is the number of data modalities, of the approximate singular values generated by IRLBA.There is one plot for each modality and then a final plot that concatenates all of the modalities together. Choose the largest elbow value among the three plots.
DetermineDFromIRLBA(SickleJr, d = 50)
DetermineDFromIRLBA(SickleJr, d = 50)
SickleJr |
An object of class SickleJr |
d |
Number of desired factors; it is important to select a number that allows you to see a clear elbow: defaults to 50. |
An object of class SickleJr with plots for IRLBA diagnostics added to its plots
slot
SimSickleJrSmall<-DetermineDFromIRLBA(SimSickleJrSmall,d=5)
SimSickleJrSmall<-DetermineDFromIRLBA(SimSickleJrSmall,d=5)
matrices and
matrixCreate the matrices and
matrix via non-negative double singular
value decomposition (NNDSVD) or randomization. For randomization, the algorithm runs for 10 rounds
for the desired number of random initializations and picks the
matrices and
matrix with
the lowest achieved loss
GenerateWmatricesandHmatrix( SickleJr, d = 10, random = FALSE, numberReps = 100, seed = 5, minibatch = FALSE, batchsize = -1, random_W_updates = FALSE, subsample = 1:dim([email protected][[1]])[2], usesvd = FALSE )
GenerateWmatricesandHmatrix( SickleJr, d = 10, random = FALSE, numberReps = 100, seed = 5, minibatch = FALSE, batchsize = -1, random_W_updates = FALSE, subsample = 1:dim(SickleJr@count.matrices[[1]])[2], usesvd = FALSE )
SickleJr |
An object of class SickleJr |
d |
Number of latent factors to use: defaults to 10 |
random |
Boolean indicating whether to use random initialization ( |
numberReps |
Number of random initializations to use: default is 5 |
seed |
Random seed for reproducibility of random initializations |
minibatch |
Indicates whether or not to use the mini-batch algorithm |
batchsize |
Size of batches for mini-batch NMF |
random_W_updates |
Indicates whether to only update each |
subsample |
A vector of values to use for subsampling; only appropriate when determining proper values for d. |
usesvd |
Indicates whether to use |
SickleJr An object of class SickleJr with the matrices and
matrix added.
SimSickleJrSmall<-SetLambdasandRegs(SimSickleJrSmall, lambdaWlist=list(10,50),lambdaH=500,HrowReg="None",WcolReg="None") SimSickleJrSmall<-GenerateWmatricesandHmatrix(SimSickleJrSmall,d=5,usesvd=TRUE)
SimSickleJrSmall<-SetLambdasandRegs(SimSickleJrSmall, lambdaWlist=list(10,50),lambdaH=500,HrowReg="None",WcolReg="None") SimSickleJrSmall<-GenerateWmatricesandHmatrix(SimSickleJrSmall,d=5,usesvd=TRUE)
Perform joint non-negative matrix factorization (NMF) across
multiple modalities of single-cell data.
To measure the discrepancy between two distributions, one can use the
Poisson Kullback-Leibler divergence
(),
the Itakura-Saito divergence (
),
or the Frobenius norm (
).
It is also possible to set graph regularization constraints on
and either a sparsity constraint on
or
an L2 norm constraint on the rows of
.
This function passes by reference and updates the variables
and
and does not require data to be in an
object of type SickleJr.
calls this function.
If your data are in an object of class SickleJr,
please use the
function instead.
jrSiCKLSNMF( datamatL, WL, H, AdjL, DL, lambdaWL, lambdaH, initsamp, suppress_warnings, diffFunc, Hconstraint = "None", Wconstraint = "None", differr = 1e-06, rounds = 1000L, display_progress = TRUE, minibatch = TRUE, batchsize = 100L, random_W_updates = TRUE, minrounds = 100L, numthreads = 1L )
jrSiCKLSNMF( datamatL, WL, H, AdjL, DL, lambdaWL, lambdaH, initsamp, suppress_warnings, diffFunc, Hconstraint = "None", Wconstraint = "None", differr = 1e-06, rounds = 1000L, display_progress = TRUE, minibatch = TRUE, batchsize = 100L, random_W_updates = TRUE, minrounds = 100L, numthreads = 1L )
datamatL |
An R list where each entry contains a normalized, sparse
|
WL |
An R list containing initialized values of the |
H |
A matrix containing initialized values for the shared
|
AdjL |
An R list containing all of the adjacency matrices for the
feature-feature similarity graphs in sparse format; note that
|
DL |
An R list containing all of the degree matrices of the
feature-feature similarity graphs; note that |
lambdaWL |
A list of the |
lambdaH |
A double containing the desired value for
|
initsamp |
A vector of randomly selected rows of |
suppress_warnings |
A Boolean that indicates whether warnings should be suppressed |
diffFunc |
A string indicating what type of divergence to use; set to
the Poisson Kullback-Leibler divergence
( |
Hconstraint |
A string that indicates whether you want to set an L2
norm constraint on the rows of |
Wconstraint |
A string that indicates whether you want to set an L2
norm constraint on the columns of |
differr |
A double containing the tolerance |
rounds |
A double containing the number of rounds |
display_progress |
A Boolean indicating whether to display the progress bar |
minibatch |
A Boolean indicating whether to use the mini-batch version of the algorithm |
batchsize |
Number of batches for mini-batch updates |
random_W_updates |
A Boolean indicating whether to update
|
minrounds |
A minimum number of rounds for the algorithm to run: most useful for the mini-batch algorithm |
numthreads |
Number of threads to use if running in parallel |
An R list containing values for the objective function.
This is a wrapper function that allows for the calculation of the loss function in R code.
lossCalcRWrapper(datamatL, WL, H, AdjL, DL, lambdaWL, lambdaH, diffFunc)
lossCalcRWrapper(datamatL, WL, H, AdjL, DL, lambdaWL, lambdaH, diffFunc)
datamatL |
An R list where each entry contains a normalized, sparse
|
WL |
An R list containing initialized values of the |
H |
A matrix containing initialized values for the shared
|
AdjL |
An R list containing all of the adjacency matrices for the
feature-feature similarity graphs in sparse format; note that
|
DL |
An R list containing all of the degree matrices of the
feature-feature similarity graphs; note that |
lambdaWL |
A list of the |
lambdaH |
A double containing the desired value for
|
diffFunc |
A vector of strings indicating what type of divergence to
use; set to the Poisson Kullback-Leibler divergence
( |
A double containing the value of the jrSiCKLSNMF loss function
To ensure sufficient convergence of the loss for jrSiCKLSNMF with mini-batch updates, we plot the loss vs the number of iterations for the mini-batch algorithm. After a certain number of iterations, the loss should appear to oscillate around a value. Before continuing with downstream analyses, please ensure that the loss exhibits this sort of behavior. For the mini-batch algorithm, it is not possible to use the convergence criteria used for the batch version of the algorithm.
MinibatchDiagnosticPlot(SickleJr)
MinibatchDiagnosticPlot(SickleJr)
SickleJr |
An object of class SickleJr |
An object of class SickleJr with mini-batch diagnostic plots added to the plots
slot.
SimSickleJrSmall<-MinibatchDiagnosticPlot(SimSickleJrSmall)
SimSickleJrSmall<-MinibatchDiagnosticPlot(SimSickleJrSmall)
Normalize the count data within each modality. The default
normalization, which should be used when using the KL divergence, is median
library size normalization. To perform median library size normalization,
each count within a cell is divided by its library size (i.e. the counts
within a column are divided by the column sum). Then, all values are
multiplied by the median library size (i.e. the median column sum). To use
the Frobenius norm, set frob=TRUE
to log normalize your
count data and use a desired
scaleFactor
. You may also use a
different form of normalization and store these results in the
normalized.count.matrices
slot.
NormalizeCountMatrices(SickleJr, diffFunc = "klp", scaleFactor = NULL)
NormalizeCountMatrices(SickleJr, diffFunc = "klp", scaleFactor = NULL)
SickleJr |
An object of class SickleJr |
diffFunc |
A vector of strings that determines the statistical "distance" to use within each modality; set to "klp" when using the Poisson KL divergence or to "fr" when using the Frobenius norm: default is KL divergence for all modalities; this also determines the type of normalization |
scaleFactor |
A single numeric value (if using the same scale factor
for each modality)
or a list of numeric values to use (if using different scale factors in
different modalities)
as scale factors for the log |
An object of class SickleJr with a list of sparse, normalized data
matrices added to its normalized.count.matrices
slot
SimSickleJrSmall<-NormalizeCountMatrices(SimSickleJrSmall) SimSickleJrSmall<-NormalizeCountMatrices(SimSickleJrSmall, diffFunc="fr",scaleFactor=1e6)
SimSickleJrSmall<-NormalizeCountMatrices(SimSickleJrSmall) SimSickleJrSmall<-NormalizeCountMatrices(SimSickleJrSmall, diffFunc="fr",scaleFactor=1e6)
Generate plots of the lowest achieved loss after a pre-specified number of iterations (default 100) of the jrSiCKLSNMF algorithm for each latent factor (defaults to 2:20). This operates similarly to a scree plot, so please select a number of latent factors that corresponds to the elbow of the plot. This method is not appropriate for larger sets of data (more than 1000 cells)
PlotLossvsLatentFactors( SickleJr, rounds = 100, differr = 1e-04, d_vector = c(2:20), parallel = FALSE, nCores = detectCores() - 1, subsampsize = NULL, minibatch = FALSE, random = FALSE, random_W_updates = FALSE, seed = NULL, batchsize = dim([email protected][[1]])[2], lossonsubset = FALSE, losssubsetsize = dim([email protected][[1]])[2], numthreads = 1 )
PlotLossvsLatentFactors( SickleJr, rounds = 100, differr = 1e-04, d_vector = c(2:20), parallel = FALSE, nCores = detectCores() - 1, subsampsize = NULL, minibatch = FALSE, random = FALSE, random_W_updates = FALSE, seed = NULL, batchsize = dim(SickleJr@count.matrices[[1]])[2], lossonsubset = FALSE, losssubsetsize = dim(SickleJr@count.matrices[[1]])[2], numthreads = 1 )
SickleJr |
An object of class SickleJr |
rounds |
Number of rounds to use: defaults to 100; this process is time consuming, so a high number of rounds is not recommended |
differr |
Tolerance for the percentage update in the likelihood: for these plots,
this defaults to |
d_vector |
Vector of |
parallel |
Boolean indicating whether to use parallel computation |
nCores |
Number of desired cores; defaults to the number of cores of the current machine minus 1 for convenience |
subsampsize |
Size of the random subsample (defaults to |
minibatch |
Boolean indicating whether to use the mini-batch algorithm: default is |
random |
Boolean indicating whether to use random initialization to generate the |
random_W_updates |
Boolean parameter for mini-batch algorithm; if |
seed |
Number representing the random seed |
batchsize |
Desired batch size; do not use if using a subsample |
lossonsubset |
Boolean indicating whether to calculate the loss on a subset rather than the full dataset; speeds up computation for larger datasets |
losssubsetsize |
Number of cells to use for the loss subset; default is total number of cells |
numthreads |
Number of threads to use if running in parallel |
An object of class SickleJr with a list of initialized matrices and an
matrix
for each latent factor
added to the
WHinitials
slot, a data frame holding relevant
values for plotting the elbow plot added to the latent.factor.elbow.values
slot, diagnostic plots of the loss vs. the number of latent factors added to the plots
slot, and the cell indices used to calculate the loss on the subsample added to the lossCalcSubSample
slot
[email protected]<-data.frame(NULL,NULL) SimSickleJrSmall<-PlotLossvsLatentFactors(SimSickleJrSmall,d_vector=c(2:6),rounds=10)
SimSickleJrSmall@latent.factor.elbow.values<-data.frame(NULL,NULL) SimSickleJrSmall<-PlotLossvsLatentFactors(SimSickleJrSmall,d_vector=c(2:6),rounds=10)
Plot the first and second dimensions of a UMAP dimension reduction and color either by clustering results or metadata.
PlotSickleJrUMAP( SickleJr, umap.modality = "H", cluster = "kmeans", title = "", colorbymetadata = NULL, legendname = NULL )
PlotSickleJrUMAP( SickleJr, umap.modality = "H", cluster = "kmeans", title = "", colorbymetadata = NULL, legendname = NULL )
SickleJr |
An object of class SickleJr |
umap.modality |
String corresponding to the name of the UMAP of interest: defaults to |
cluster |
String input that indicates which cluster to color by: defaults to |
title |
String input for optional plot title |
colorbymetadata |
Name of metadata column if coloring by metadata |
legendname |
String input that to allow specification of a different legend name |
An object of class SickleJr with plots added to the plots
slot
SimSickleJrSmall<-PlotSickleJrUMAP(SimSickleJrSmall, title="K-Means Example") SimSickleJrSmall<-PlotSickleJrUMAP(SimSickleJrSmall,umap.modality=1)
SimSickleJrSmall<-PlotSickleJrUMAP(SimSickleJrSmall, title="K-Means Example") SimSickleJrSmall<-PlotSickleJrUMAP(SimSickleJrSmall,umap.modality=1)
Wrapper function to run jrSiCKLSNMF on an object of class SickleJr. Performs jrSiCKLSNMF on the given SickleJr
RunjrSiCKLSNMF( SickleJr, rounds = 30000, differr = 1e-06, display_progress = TRUE, lossonsubset = FALSE, losssubsetsize = dim(SickleJr@H)[1], minibatch = FALSE, batchsize = 1000, random_W_updates = FALSE, seed = NULL, minrounds = 200, suppress_warnings = FALSE, subsample = 1:dim([email protected][[1]])[2], numthreads = detectCores() - 1 )
RunjrSiCKLSNMF( SickleJr, rounds = 30000, differr = 1e-06, display_progress = TRUE, lossonsubset = FALSE, losssubsetsize = dim(SickleJr@H)[1], minibatch = FALSE, batchsize = 1000, random_W_updates = FALSE, seed = NULL, minrounds = 200, suppress_warnings = FALSE, subsample = 1:dim(SickleJr@normalized.count.matrices[[1]])[2], numthreads = detectCores() - 1 )
SickleJr |
An object of class SickleJr |
rounds |
Number of rounds: defaults to 2000 |
differr |
Tolerance for percentage change in loss between updates: defaults to 1e-6 |
display_progress |
Boolean indicating whether to display the progress bar for jrSiCKLSNMF |
lossonsubset |
Boolean indicating whether to use a subset to calculate the loss function rather than the whole dataset |
losssubsetsize |
Size of the subset of data on which to calculate the loss |
minibatch |
Boolean indicating whether to use mini-batch updates |
batchsize |
Size of batch for mini-batch updates |
random_W_updates |
Boolean indicating whether or not to use random_W_updates updates
(i.e. only update |
seed |
Number specifying desired random seed |
minrounds |
Minimum number of rounds: most helpful for the mini-batch algorithm |
suppress_warnings |
Boolean indicating whether to suppress warnings |
subsample |
A numeric used primarily when finding an appropriate number of latent factors: defaults to total number of cells |
numthreads |
Number of threads to use if running in parallel |
An object of class SickleJr with updated matrices, updated
matrix, and a vector of values for
the loss function added to the
Wlist
, H
, and loss
slots, respectively
SimSickleJrSmall<-RunjrSiCKLSNMF(SimSickleJrSmall,rounds=5)
SimSickleJrSmall<-RunjrSiCKLSNMF(SimSickleJrSmall,rounds=5)
Provide the values for the graph regularization
for each modality as a list and provide a
SetLambdasandRegs( SickleJr, lambdaWlist = list(10, 50), lambdaH = 500, HrowReg = "None", WcolReg = "None" )
SetLambdasandRegs( SickleJr, lambdaWlist = list(10, 50), lambdaH = 500, HrowReg = "None", WcolReg = "None" )
SickleJr |
An object of class SickleJr |
lambdaWlist |
A list of graph regularization constraints for the |
lambdaH |
A numeric holding the sparsity constraint on |
HrowReg |
A string that is equal to |
WcolReg |
A string that is equal to |
An object of class SickleJr with the lambda hyperparameter values added to its lambdaWlist
and lambdaH
slots
SimSickleJrSmall<-SetLambdasandRegs(SimSickleJrSmall, lambdaWlist=list(10,50),lambdaH=500,HrowReg="None",WcolReg="None") SimSickleJrSmall<-SetLambdasandRegs(SimSickleJrSmall, lambdaWlist=list(3,15),lambdaH=0,HrowReg="L2Norm",WcolReg="None")
SimSickleJrSmall<-SetLambdasandRegs(SimSickleJrSmall, lambdaWlist=list(10,50),lambdaH=500,HrowReg="None",WcolReg="None") SimSickleJrSmall<-SetLambdasandRegs(SimSickleJrSmall, lambdaWlist=list(3,15),lambdaH=0,HrowReg="L2Norm",WcolReg="None")
matrices and
matrix from pre-calculated valuesUse values calculated in the step to determine number of latent factors in the initial
steps for the jrSiCKLSNMF algorithm. If only a subset was calculated, this produces an error.
In this case, please use GenerateWmatricesandHmatrix
to generate new
matrices and a new
matrix.
SetWandHfromWHinitials(SickleJr, d)
SetWandHfromWHinitials(SickleJr, d)
SickleJr |
An object of class SickleJr |
d |
The number of desired latent factors |
An object of class SickleJr with the Wlist
slot and the H
slot filled from pre-calculated values.
SimSickleJrSmall<-SetWandHfromWHinitials(SimSickleJrSmall,d=5)
SimSickleJrSmall<-SetWandHfromWHinitials(SimSickleJrSmall,d=5)
Defines the SickleJr class for use with jrSiCKLSNMF. This object contains all of the information required for analysis using jrSiCKLSNMF. This includes count matrices, normalized matrices, graph Laplacians, hyperparameters, diagnostic plots, and plots of cell clusters.
An object of class SickleJr
count.matrices
A list containing all of the quality controlled count matrices. Note that these count matrices should not use all features and should only include features that appear in at a minimum 10 cells.
normalized.count.matrices
A list that holds the normalized count matrices
graph.laplacian.list
A list of the graph Laplacians to be used for graph regularization
Hregularization
A string that indicates the type of row regularization to use for H. Types include "None," "L2Norm," and "Ortho"
Wregularization
A string that indicates the type of column regularization to use for H. Types include "None," "L2Norm," and "Ortho"
diffFunc
A vector of strings that holds the name of the function used to measure the "distance" between
data matrix X and WH for each modality; can be "klp"
for the Poisson Kullback-Leibler divergence
or "fr"
for the Frobenius norm. If the length of the vector differs from the number of modalities,
only the first string is used
lambdaWlist
A list of lambda values to use as the hyperparameters for the
corresponding in the
modality
lambdaH
A numeric value corresponding to the hyperparameter of the sparsity constraint on
Wlist
A list of the generated matrices, one for each modality
H
The transpose of the shared matrix.
WHinitials
A list that if, when using PlotLossvsLatentFactors
, all of the cells are used to calculate
the initial values, stores these initial generated matrices; can be used
as initializations when running RunjrSiCKLSNMF
to save time
lossCalcSubsample
A vector that holds the cell indices on which PlotLossvsLatentFactors
was calculated
latent.factor.elbow.values
A data frame that holds the relevant information to plot the latent factor elbow plot
minibatch
Indicator variable that states whether the algorithm should use mini-batch updates.
clusterdiagnostics
List of the cluster diagnostic results for the SickleJr object. Includes diagnostic plots from fviz_nbclust
and
and diagnostics from clValid
clusters
List of results of different clustering methods performed on the SickleJr object
metadata
List of metadata
loss
Vector of the value for the loss function
umap
List of different UMAP-based dimension reductions using umap
plots
Holds various ggplot
results for easy access of diagnostics and cluster visualizations
A simulated dataset with multiplicative noise for the
scRNA-seq variability parameter in SPARSim for the simulated scRNA-seq data and
with
additive noise to the expression levels of the scATAC-seq data
for data simulated via SimATAC. The simulated matrices are located in SimData$Xmatrices
and the identities for the cell types are contained in SimData$cell_type. This corresponds
to the Xmatrix data found in both XandLmatrices25/XandindividLKNNLmatrices1Sparsity5.RData
and XandBulkLmatrix25/XandBulkLKNNmatrices1Sparsity5.RData.on our Github
ellisdoro/jrSiCKLSNMF_Simulations
data(SimData)
data(SimData)
A list made up of a two items. The first is list of 2 simulated sparse matrices and the second is a vector containing cell identities.
A list of 2 sparse matrices, each containing a different simulated omics modality measured on the same set of single cells: the first entry in the list corresponds to simulated scRNA-seq data and has 1000 genes and 300 cells; the second entry in the list corresponds to simulated scATAC-seq data and has 5910 peaks and 300 cells.
A vector containing the cell-type identities of the simulated data
SimData
data object. Contains the completed analysis from the
'Getting Started' vignette for a small subset of 10 cells with 150 genes and
700 peaks. The clusters derived from this dataset are not accurate; this dataset
is intended for use with code examples.A small SickleJr object containing a subset of data from the
SimData
data object. Contains the completed analysis from the
'Getting Started' vignette for a small subset of 10 cells with 150 genes and
700 peaks. The clusters derived from this dataset are not accurate; this dataset
is intended for use with code examples.
data(SimSickleJrSmall)
data(SimSickleJrSmall)
A SickleJr object containing a completed analysis using jrSiCKLSNMF
Contains a list of 2 sparse matrices, each containing a different simulated omics modality measured on the same set of single cells
The normalized versions of the count matrices
contained in slot count.matrices
A list of sparse matrices containing the graph Laplacians corresponding to the KNN feature-feature similarity graphs constructed for each omics modality
A string indicating the row regularization: here it
is set to "None"
A string specifying the function to measure the discrepancy between the normalized data and the fitted matrices: here, it is set to "klp" for the Poisson Kullback-Leibler divergence
A list holding the graph regularization parameters: here, they are 10 and 50
A numeric indicating the value for the sparsity parameter. Here it is equals 500
A list holding the fitted matrices
A matrix holding
A list of initial values for and
A vector containing a subset on which to calculate the loss
A data frame holding the loss and the number of latent factor that is used for diagnostic plots
A Boolean indicating whether or not to use the mini-batch algorithm: FALSE
here
Diagnostic plots and results
A list holding the "kmeans"
clustering results
A list holding metadata; here this is just cell type information
A list holding a vector called "Loss"
A list holding various UMAP approximations
A list holding ggplots corresponding to different diagnostics and visualizations