Version: | 1.0.13 |
Date: | 2023-9-14 |
Title: | Distance Based Cell Lineage Reconstruction |
Author: | Il-Youp Kwak [aut, cre], Wuming Gong [aut] |
Maintainer: | Il-Youp Kwak <ikwak2@cau.ac.kr> |
License: | GPL-3 |
VignetteBuilder: | knitr |
LinkingTo: | Rcpp, RcppArmadillo |
RoxygenNote: | 7.2.3 |
Depends: | R (≥ 4.1.0), tensorflow(≥ 2.2.0) |
Encoding: | UTF-8 |
Imports: | BiocParallel, dplyr, Matrix, matrixStats, ape, phangorn, Rcpp, igraph, methods, purrr, stringr, tidyr, rBayesianOptimization, rlang, BiocGenerics |
Suggests: | knitr, rmarkdown, markdown |
Description: | R codes for distance based cell lineage reconstruction. Our methods won both sub-challenges 2 and 3 of the Allen Institute Cell Lineage Reconstruction DREAM Challenge in 2020. References: Gong et al. (2021) <doi:10.1016/j.cels.2021.05.008>, Gong et al. (2022) <doi:10.1186/s12859-022-04633-x>. |
URL: | https://github.com/ikwak2/DCLEAR |
NeedsCompilation: | yes |
Packaged: | 2023-09-14 07:01:28 UTC; ikwak2 |
Repository: | CRAN |
Date/Publication: | 2023-09-14 07:32:35 UTC |
DCLEAR: A package for DCLEAR: Distance based Cell LinEAge Reconstruction
Description
Distance based methods for inferring lineage treess from single cell data
WH
Description
implementation of weighted hamming algorithm
Usage
WH(x, InfoW, dropout = FALSE)
Arguments
x |
Sequence object of 'phyDat' type. |
InfoW |
Weight vector for the calculation of weighted hamming distance |
dropout |
Different weighting strategy is taken to consider interval dropout with dropout = 'TRUE'. Default is, dropout = 'FALSE'. |
Value
Calculated distance matrix of input sequences. The result is a 'dist' class object.
Author(s)
Il-Youp Kwak
Examples
set.seed(1)
library(phangorn)
mu_d1 = c( 30, 20, 10, 5, 5, 1, 0.01, 0.001)
mu_d1 = mu_d1/sum(mu_d1)
simn = 10 # number of cell samples
m = 10 ## number of targets
sD = sim_seqdata(sim_n = simn, m = m, mu_d = 0.03,
d = 12, n_s = length(mu_d1), outcome_prob = mu_d1, p_d = 0.005 )
## RF score with hamming distance
D_h = dist.hamming(sD$seqs)
tree_h= NJ(D_h)
RF.dist(tree_h, sD$tree, normalize = TRUE)
## RF score with weighted hamming
InfoW = -log(mu_d1)
InfoW[1:2] = 1
InfoW[3:7] = 4.5
D_wh = WH(sD$seqs, InfoW)
tree_wh= NJ(D_wh)
RF.dist(tree_wh, sD$tree, normalize = TRUE)
## RF score with weighted hamming, cosidering dropout situation
nfoW = -log(mu_d1)
InfoW[1] = 1
InfoW[2] = 12
InfoW[3:7] = 3
D_wh2 = WH(sD$seqs, InfoW, dropout=TRUE)
tree_wh2= NJ(D_wh2)
RF.dist(tree_wh2, sD$tree, normalize = TRUE)
Train weights for WH
Description
Train weights for WH and output weight vector
Usage
WH_train(X, loc0 = 2, locDropout = 1, locMissing = FALSE)
Arguments
X |
a list of k number of input data, X[[1]] ... X[[k]]. The ith data have sequence information as phyDat format in X[[i]][[1]], and tree information in X[[i]][[2]] as phylo format. |
loc0 |
weight location of initial state |
locDropout |
weight location of dropout state |
locMissing |
weight location of missing state, FALSE if there is no missing values |
Value
a weight vector
Author(s)
Il-Youp Kwak (ikwak2@cau.ac.kr)
Train weights for WH, and output distance object
Description
Train weights for WH using the given data, and fit the distance matrix for a input sequence.
Usage
WH_train_fit(x, X)
Arguments
x |
input data in phyDat format |
X |
a list of k number of input data, X[[1]] ... X[[k]]. The ith data have sequence information as phyDat format in X[[i]][[1]], and tree information in X[[i]][[2]] as phylo format. |
Value
a dist object
Author(s)
Il-Youp Kwak (ikwak2@cau.ac.kr)
add_deletion
Description
Add deletion
Usage
add_deletion(x, tree, mutation_site, config)
Arguments
x |
a character matrix |
tree |
a matrix representing the lineage tree |
mutation_site |
a binary matrix for mutation site |
config |
a lineage_tree_config object |
Value
a character matrix with deletions
add_dropout
Description
Add dropout events
Usage
add_dropout(x, config)
Arguments
x |
a character matrix |
config |
a lineage_tree_config object |
Value
a character matrix with dropout events
Generic function for as_igraph
Description
Generic function for as_igraph
Usage
as_igraph(x, ...)
Arguments
x |
a phylo object |
... |
additional parameters |
as_igraph
Description
Convert an phylo object to an igraph object, while keeping the weight (in contrast to igraph::as.igraph)
Usage
## S4 method for signature 'data.frame'
as_igraph(x, config)
Arguments
x |
a phylo object |
config |
a 'lineage_tree_config' object |
Value
an igraph object
as_igraph
Description
Convert an phylo object to an igraph object, while keeping the weight (in contrast to igraph::as.igraph)
Usage
## S4 method for signature 'phylo'
as_igraph(x)
Arguments
x |
a phylo object |
Value
an igraph object
Generic function for as_lineage_tree
Description
Generic function for as_lineage_tree
Usage
as_lineage_tree(x, y, config, ...)
Arguments
x |
a phyDat object |
y |
a phylo object |
config |
a lineage_tree_config object |
... |
additional parameters |
as_lineage_tree
Description
Convert a phylo object and a phyDat object to a lineage_tree object
Usage
## S4 method for signature 'phyDat,phylo,lineage_tree_config'
as_lineage_tree(x, y, config, ...)
Arguments
x |
a phyDat object |
y |
a phylo object |
config |
a lineage_tree_config object |
... |
additional parameters |
Value
a lineage_tree object
Generic function for as_phylo
Description
Generic function for as_phylo
Usage
as_phylo(x, ...)
Arguments
x |
a graph object |
... |
additional parameters |
as_phylo
Description
Convert an igraph object to a phylo object
Usage
## S4 method for signature 'igraph'
as_phylo(x)
Arguments
x |
an igraph object |
Value
a phylo object or a igraph object
Core function of computing kmer replacement distance
Description
Compute the sequence distance matrix using inferred kmer replacement matrix
Usage
dist_kmer_replacement_inference(x, kmer_summary, k = 2)
Arguments
x |
input data in phyDat format |
kmer_summary |
a kmer_summary object |
k |
k-mers (default k=2) |
Value
a dist object
Author(s)
Wuming Gong (gongx030@umn.edu)
Generic function for dist_replacement
Description
Generic function for dist_replacement
Usage
dist_replacement(x, kmer_summary, k, ...)
Arguments
x |
a sequence object |
kmer_summary |
a kmer_summary object |
k |
k-mer length |
... |
additional parameters |
Compute the kmer replacement distance
Description
Compute the kmer replacement distance between sequences
Usage
## S4 method for signature 'phyDat,kmer_summary,integer'
dist_replacement(x, kmer_summary, k = 2, ...)
Arguments
x |
input data in phyDat format |
kmer_summary |
a kmer_summary object |
k |
k-mer length |
... |
other arguments passed to substr_kmer |
Value
a dist object
Author(s)
Wuming Gong (gongx030@umn.edu)
Compute the kmer replacement distance
Description
Compute the kmer replacement distance between sequences
Usage
## S4 method for signature 'phyDat,missing,integer'
dist_replacement(x, kmer_summary, k = 2L, ...)
Arguments
x |
input data in phyDat format |
kmer_summary |
a kmer_summary object |
k |
k-mer length |
... |
other arguments passed to substr_kmer |
Value
a dist object
Author(s)
Wuming Gong (gongx030@umn.edu)
Generic function for dist_weighted_hamming
Description
Generic function for dist_weighted_hamming
Usage
dist_weighted_hamming(x, wVec, ...)
Arguments
x |
a sequence object |
wVec |
weight vector |
... |
additional parameters |
dist_weighted_hamming
Description
implementation of weighted hamming algorithm
Usage
## S4 method for signature 'phyDat,numeric'
dist_weighted_hamming(x, wVec, dropout = FALSE)
Arguments
x |
Sequence object of 'phyDat' type. |
wVec |
Weight vector for the calculation of weighted hamming distance |
dropout |
Different weighting strategy is taken to consider interval dropout with dropout = 'TRUE'. Default is, dropout = 'FALSE'. |
Value
Calculated distance matrix of input sequences. The result is a 'dist' class object.
Author(s)
Il-Youp Kwak
Examples
library(DCLEAR)
library(phangorn)
library(ape)
set.seed(1)
mu_d1 = c( 30, 20, 10, 5, 5, 1, 0.01, 0.001)
mu_d1 = mu_d1/sum(mu_d1)
simn = 10 # number of cell samples
m = 10 ## number of targets
sD = sim_seqdata(sim_n = simn, m = m, mu_d = 0.03,
d = 12, n_s = length(mu_d1), outcome_prob = mu_d1, p_d = 0.005 )
## RF score with hamming distance
D_hm = dist.hamming(sD$seqs)
tree_hm = NJ(D_hm)
RF.dist(tree_hm, sD$tree, normalize = TRUE)
## RF score with weighted hamming
InfoW = -log(mu_d1)
InfoW[1:2] = 1
InfoW[3:7] = 4.5
D_wh = dist_weighted_hamming(sD$seqs, InfoW, dropout = FALSE)
tree_wh = NJ(D_wh)
RF.dist(tree_wh, sD$tree, normalize = TRUE)
## RF score with weighted hamming, cosidering dropout situation
nfoW = -log(mu_d1)
InfoW[1] = 1
InfoW[2] = 12
InfoW[3:7] = 3
D_wh2 = dist_weighted_hamming(sD$seqs, InfoW, dropout = TRUE)
tree_wh2= NJ(D_wh2)
RF.dist(tree_wh2, sD$tree, normalize = TRUE)
Generic function for downsample
Description
Generic function for downsample
Usage
downsample(x, ...)
Arguments
x |
a data object |
... |
additional parameters |
downsample
Description
Sample a lineage tree
Usage
## S4 method for signature 'igraph'
downsample(x, n = 10L, ...)
Arguments
x |
a igraph object |
n |
number of leaves (tips) in the down-sampled tree |
... |
additional parameters |
Value
a phylo object
downsample
Description
Sample a lineage tree
Usage
## S4 method for signature 'lineage_tree'
downsample(x, n = 10L, ...)
Arguments
x |
a lineage_tree object |
n |
number of leaves (tips) in the down-sampled tree |
... |
additional parameters |
Value
a lineage_tree object
get_distance_prior
Description
prior distribution of distance
Usage
get_distance_prior(x)
Arguments
x |
a kmer_summary object |
Value
a probabilistic vector of the distribution of nodal distances
Author(s)
Wuming Gong (gongx030@umn.edu)
Generic function for get_leaves
Description
Generic function for get_leaves
Usage
get_leaves(x, ...)
Arguments
x |
a lineage_tree object |
... |
additional parameters |
get_leaves
Description
Get the leaf sequences
Usage
## S4 method for signature 'lineage_tree'
get_leaves(x, ...)
Arguments
x |
a lineage_tree object |
... |
additional parameters |
Value
a phyDat object
get_node_names
Description
Convenient function for get node names
Usage
get_node_names(x)
Arguments
x |
node id |
Value
node names
Author(s)
Wuming Gong (gongx030@umn.edu)
get_replacement_probability
Description
Compute p(A,B|d), the conditional probability of seeing a replacement of from kmer A to B or vice versa
Usage
get_replacement_probability(x)
Arguments
x |
a kmer_summary object |
Value
an 3D probabilistic array (kmers by kmers by distances)
Author(s)
Wuming Gong (gongx030@umn.edu)
get_sequence
Description
Get sequencees
Usage
get_sequence(x, tree, outcome, config)
Arguments
x |
a character matrix |
tree |
a matrix representing the lineage tree |
outcome |
a character matrix |
config |
a lineage_tree_config object |
Value
a character matrix
get_transition_probability
Description
Compute p(A,X|B,Y,d), the conditional probability of seeing a replacement from A to B given the previous replacement B from Y at nodal distance d
Usage
get_transition_probability(x)
Arguments
x |
a kmer_summary object |
Value
an 3D probabilistic array (kmers by kmers by distances)
Author(s)
Wuming Gong (gongx030@umn.edu)
Lineage data
Description
Lineage data
Usage
data(lineages)
Format
An object of class list
of length 100.
Examples
data(lineages)
positional_mutation_prob
Description
Convenient function for get node names
Usage
positional_mutation_prob(x, config)
Arguments
x |
a phyDat object |
config |
a lineage_tree_config object |
Value
a positional mutation probability matrix
Generic function for process_sequence
Description
Generic function for process_sequence
Usage
process_sequence(x, ...)
Arguments
x |
a sequence object |
... |
additional parameters |
Process sequences
Description
Process sequences
Usage
## S4 method for signature 'phyDat'
process_sequence(
x,
division = 16L,
dropout_character = "*",
default_character = "0",
deletion_character = "-"
)
Arguments
x |
input data in phyDat format |
division |
cell division |
dropout_character |
Dropout character (default: '*') |
default_character |
Default character (default: '0') |
deletion_character |
Deletion character (default: '-') |
Value
a 'lineage_tree_config' object
Author(s)
Wuming Gong (gongx030@umn.edu)
Generic function for prune
Description
Generic function for prune
Usage
prune(x, ...)
Arguments
x |
a lineage_tree object |
... |
additional parameters |
prune
Description
Trim a full lineage tree into phylogenetic tree
Usage
## S4 method for signature 'igraph'
prune(x, weighted = TRUE, ...)
Arguments
x |
an igraph object |
weighted |
whether or not keep the edge weight (default: TRUE) |
... |
additional parameters |
Value
an igraph object
prune
Description
Trim a full lineage tree into phylogenetic tree
Usage
## S4 method for signature 'lineage_tree'
prune(x, ...)
Arguments
x |
a lineage_tree object |
... |
additional parameters passed to as_phylo() |
Value
a lineage_tree object
random_tree
Description
Simulate a random lineage tree
Usage
random_tree(n_samples, division = 16L)
Arguments
n_samples |
number of samples to simulate |
division |
number of cell division |
Value
a data frame
Author(s)
Wuming Gong (gongx030@umn.edu)
rbind
Description
Concatenate multiple phyDat objects
Usage
## S4 method for signature 'phyDat'
rbind(..., deparse.level = 1)
Arguments
... |
a list of phyDat objects |
deparse.level |
see definition in generic rbind |
Value
a phyDat object
sample_mutation_outcome
Description
Sample mutation outcome
Usage
sample_mutation_outcome(x, mp = NULL, config)
Arguments
x |
an igraph object |
mp |
a mutation site matrix |
config |
a lineage_tree_config object |
Value
a outcome matrix
sample_mutation_site
Description
Sample mutation site
Usage
sample_mutation_site(tree, config)
Arguments
tree |
a data frame |
config |
a lineage_tree_config object |
Value
a mutation site matrix
sample_outcome_prob
Description
Sampling outcome probability based on a gamma distribution
Usage
sample_outcome_prob(config, num_states = 20L, shape = 0.1, scale = 2)
Arguments
config |
a lineage_tree_config object |
num_states |
number of states used in simulation. |
shape |
shape parameter in gamma distribution |
scale |
scale parameter in gamma distribution |
Value
a probability vector for each alphabet
Author(s)
Wuming Gong (gongx030@umn.edu)
score_simulation
Description
Compare two sets of sequences
Usage
score_simulation(x, y, config)
Arguments
x |
a character matrix |
y |
a character matrix |
config |
a lineage_tree_config object |
Value
numeric scores
sim_seqdata
Description
Generate singe cell barcode data set with tree shaped lineage information
Usage
sim_seqdata(
sim_n = 200,
m = 200,
mu_d = 0.03,
d = 15,
n_s = 23,
outcome_prob = NULL,
p_d = 0.003
)
Arguments
sim_n |
Number of cell samples to simulate. |
m |
Number of targets. |
mu_d |
Mutation rate. (a scalar or a vector) |
d |
Number of cell divisions. |
n_s |
Number of possible outcome states |
outcome_prob |
Outcome probability vector (default is NULL) |
p_d |
Dropout probability |
Value
The result is a list containing two objects, 'seqs' and 'tree'. The 'seqs' is 'phyDat' object of 'sim_n' number of simulated barcodes corresponding to each cell, and The 'tree' is a 'phylo' object, a ground truth tree structure for the simulated data.
Author(s)
Il-Youp Kwak
Examples
library(DCLEAR)
library(phangorn)
library(ape)
set.seed(1)
mu_d1 = c( 30, 20, 10, 5, 5, 1, 0.01, 0.001)
mu_d1 = mu_d1/sum(mu_d1)
simn = 10 # number of cell samples
m = 10 ## number of targets
sD = sim_seqdata(sim_n = simn, m = m, mu_d = 0.03,
d = 12, n_s = length(mu_d1), outcome_prob = mu_d1, p_d = 0.005 )
## RF score with hamming distance
D_hm = dist.hamming(sD$seqs)
tree_hm = NJ(D_hm)
RF.dist(tree_hm, sD$tree, normalize = TRUE)
## RF score with weighted hamming
InfoW = -log(mu_d1)
InfoW[1:2] = 1
InfoW[3:7] = 4.5
D_wh = dist_weighted_hamming(sD$seqs, InfoW, dropout=FALSE)
tree_wh = NJ(D_wh)
RF.dist(tree_wh, sD$tree, normalize = TRUE)
## RF score with weighted hamming, cosidering dropout situation
nfoW = -log(mu_d1)
InfoW[1] = 1
InfoW[2] = 12
InfoW[3:7] = 3
D_wh2 = dist_weighted_hamming(sD$seqs, InfoW, dropout = TRUE)
tree_wh2= NJ(D_wh2)
RF.dist(tree_wh2, sD$tree, normalize = TRUE)
Generic function for simulate
Description
Generic function for simulate
Usage
simulate(config, x, ...)
Arguments
config |
a lineage_tree_config object |
x |
a sequence object |
... |
additional parameters |
simulate
Description
Simulate a cell lineage tree Adoped from https://github.com/elifesciences-publications/CRISPR_recorders_sims/blob/master/MATLAB_sims/GESTALT_30hr_1x_simulation.m
Usage
## S4 method for signature 'lineage_tree_config,missing'
simulate(config, x, n_samples = 200, ...)
Arguments
config |
simulation configuration; a lineage_tree_config object |
x |
missing |
n_samples |
number of samples to simulate |
... |
additional parameters |
Value
a lineage_tree object
Author(s)
Wuming Gong (gongx030@umn.edu)
simulate
Description
Simulate a cell lineage tree based on a set of sequences
Usage
## S4 method for signature 'lineage_tree_config,phyDat'
simulate(config, x, n_samples = 200L, k = 50, greedy = TRUE, ...)
Arguments
config |
simulation configuration; a lineage_tree_config object |
x |
a sequence object |
n_samples |
number of samples to simulate |
k |
Number of trials |
greedy |
Whether ot not use a greedy search |
... |
additional parameters |
Value
a lineage_tree object
Author(s)
Wuming Gong (gongx030@umn.edu)
simulate_core
Description
Simulate a cell lineage tree Adoped from https://github.com/elifesciences-publications/CRISPR_recorders_sims/blob/master/MATLAB_sims/GESTALT_30hr_1x_simulation.m
Usage
simulate_core(config, tree, mutation_site, outcome)
Arguments
config |
simulation configuration; a lineage_tree_config object |
tree |
a matrix representing the lineage tree |
mutation_site |
a binary matrix indicating the mutation sites |
outcome |
a character matrix |
Value
a 'lineage_tree' object
Generic function for substr_kmer
Description
Generic function for substr_kmer
Usage
substr_kmer(x, ...)
Arguments
x |
a kmer object |
... |
additional parameters |
Subseting a kmer_summary object
Description
Summarize the short k-mer summary from the long k-mer summary
Usage
## S4 method for signature 'kmer_summary'
substr_kmer(x, k = 2)
Arguments
x |
a kmer_summary object |
k |
k-mer length(default: 2) |
Value
a new kmer_summary object
Author(s)
Wuming Gong (gongx030@umn.edu)
Generic function for subtract
Description
Generic function for subtract
Usage
subtract(x, y, ...)
Arguments
x |
a lineage_tree object |
y |
a lineage_tree object |
... |
additional parameters |
subtract
Description
Subtract a subtree from a large tree
Usage
## S4 method for signature 'lineage_tree,lineage_tree'
subtract(x, y, ...)
Arguments
x |
a lineage_tree object |
y |
a lineage_tree object |
... |
additional parameters |
Value
a lineage_tree object
Generic function for subtree
Description
Generic function for subtree
Usage
subtree(x, ...)
Arguments
x |
a lineage_tree object |
... |
additional parameters |
subtree
Description
Extract a subtree with specific leaves
Usage
## S4 method for signature 'lineage_tree'
subtree(x, leaves = NULL, ...)
Arguments
x |
a lineage_tree object |
leaves |
leaves of the extracted tree |
... |
additional parameters |
Value
a lineage_tree object
subtree
Description
Extract a subtree with specific leaves
Usage
## S4 method for signature 'phylo'
subtree(x, leaves = NULL, ...)
Arguments
x |
a phylo object |
leaves |
leaves of the extracted tree |
... |
additional parameters |
Value
a pylo object
Generic function for summarize_kmer
Description
Generic function for summarize_kmer
Usage
summarize_kmer(x, ...)
Arguments
x |
a sequence object |
... |
additional parameters |
summarize_kmer
Description
Summarize kmer distributions with input sequences
Usage
## S4 method for signature 'phyDat'
summarize_kmer(
x,
division = 16L,
k = 2,
reps = 20L,
n_samples = 200L,
n_nodes = 100L,
n_targets
)
Arguments
x |
input data as a phyDat object |
division |
number of cell division |
k |
k-mer (default = 2) |
reps |
number of simulated trees |
n_samples |
number of samples to simulate |
n_nodes |
number of nodes to sample (including both leaves and internval nodes) |
n_targets |
sequence length. If this argument is missing, the length of the input sequences will be used. |
Value
a kmer_summary object
Author(s)
Wuming Gong (gongx030@umn.edu)
summarize_kmer_core
Description
Summarize kmer distributions (core function)
Usage
summarize_kmer_core(
k = 2,
reps = 20L,
n_samples = 200L,
n_nodes = 100L,
config = NULL
)
Arguments
k |
k-mer (default = 2) |
reps |
number of simulated trees |
n_samples |
number of samples to simulate |
n_nodes |
number of nodes to sample (including both leaves and internval nodes) |
config |
lineage tree configuration (a lineage_tree_config object) |
Value
a kmer_summary object
Author(s)
Wuming Gong (gongx030@umn.edu)