| Type: | Package | 
| Title: | CRISPR Pooled Screen Analysis using Beta-Binomial Test | 
| Version: | 1.3.6 | 
| Date: | 2025-10-01 | 
| Description: | Provides functions for hit gene identification and quantification of sgRNA (single-guided RNA) abundances for CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) pooled screen data analysis. Details are in Jeong et al. (2019) <doi:10.1101/gr.245571.118> and Baggerly et al. (2003) <doi:10.1093/bioinformatics/btg173>. | 
| Depends: | R (≥ 3.5.0) | 
| License: | MIT + file LICENSE | 
| LazyData: | true | 
| Imports: | Rcpp, metap, magrittr, dplyr, tibble, stringr, ggplot2, tidyr, glue, pheatmap, tools, readr, parallel, R.utils | 
| LinkingTo: | Rcpp, RcppArmadillo | 
| Suggests: | testthat, knitr, rmarkdown | 
| RoxygenNote: | 7.3.2 | 
| Encoding: | UTF-8 | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | yes | 
| Packaged: | 2025-10-01 13:13:27 UTC; hyun-hwanjeong | 
| Author: | Hyun-Hwan Jeong [aut, cre] | 
| Maintainer: | Hyun-Hwan Jeong <hyunhwanjeong@icloud.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-10-01 13:40:12 UTC | 
A benchmark CRISPRn pooled screen data from Evers et al.
Description
A benchmark CRISPRn pooled screen data from Evers et al.
Usage
data(Evers_CRISPRn_RT112)
Format
The data object is a list and contains below information:
- count
- The count matrix from Evers et al.'s paper and contains the CRISPRn screening result using RT112 cell-line. It contains three different replicates for T0 (before) and contains different three replicates for T1 (after). 
- egenes
- The list of 46 essential genes used in Evers et al.'s study. 
- ngenes
- The list of 47 non-essential genes used in Evers et al.'s study. 
- design
- The data.frame contains study design. 
- sg_stat
- The data.frame contains the sgRNA-level statistics. 
- gene_stat
- The data.frame contains the gene-level statistics. 
Source
https://pubmed.ncbi.nlm.nih.gov/27111720/
A benchmark CRISPRn pooled screen data from Sanson et al.
Description
A benchmark CRISPRn pooled screen data from Sanson et al.
Usage
data(Sanson_CRISPRn_A375)
Format
The data object is a list and contains below information:
- count
- The count matrix from Sanson et al.'s paper and contains the CRISPRn screening result using A375 cell-line. It contains a sample of plasimd, and three biological replicates after three weeks. 
- egenes
- The list of 1,580 essential genes used in Sanson et al.'s study. 
- ngenes
- The list of 927 non-essential genes used in Sanson et al.'s study. 
- design
- The data.frame contains study design. 
Source
https://pubmed.ncbi.nlm.nih.gov/30575746/
A function to calculate the mappabilities of each NGS sample.
Description
A function to calculate the mappabilities of each NGS sample.
Usage
calc_mappability(count_obj, df_design)
Arguments
| count_obj | A list object is created by 'run_sgrna_quant'. | 
| df_design | The table contains a study design. | 
Examples
library(CB2)
library(magrittr)
library(tibble)
library(dplyr)
library(glue)
FASTA <- system.file("extdata", "toydata", "small_sample.fasta", package = "CB2")
ex_path <- system.file("extdata", "toydata", package = "CB2")
df_design <- tribble(
  ~group, ~sample_name,
  "Base", "Base1",  
  "Base", "Base2", 
  "High", "High1",
  "High", "High2") %>% 
    mutate(fastq_path = glue("{ex_path}/{sample_name}.fastq"))
cb2_count <- run_sgrna_quant(FASTA, df_design)
calc_mappability(cb2_count, df_design)
A C++ function to perform a parameter estimation for the sgRNA-level test. It will estimate two different parameters 'phat' and 'vhat,' and we assume input count data follows the beta-binomial distribution. Dr. Keith Baggerly initially implemented this code in Matlab, and it has been rewritten it in C++ for the speed-up.
Description
A C++ function to perform a parameter estimation for the sgRNA-level test. It will estimate two different parameters 'phat' and 'vhat,' and we assume input count data follows the beta-binomial distribution. Dr. Keith Baggerly initially implemented this code in Matlab, and it has been rewritten it in C++ for the speed-up.
Usage
fit_ab(xvec, nvec)
Arguments
| xvec | a matrix contains sgRNA read counts. | 
| nvec | a vector contains the library size. | 
A function to normalize sgRNA read counts.
Description
A function to normalize sgRNA read counts.
Usage
get_CPM(sgcount)
Arguments
| sgcount | The input table contains read counts of sgRNAs for each sample A function to calculate the CPM (Counts Per Million) (required) | 
Value
a normalized CPM table will be returned
Examples
library(CB2)
data(Evers_CRISPRn_RT112)
get_CPM(Evers_CRISPRn_RT112$count)
A function to join a count table and a design table.
Description
A function to join a count table and a design table.
Usage
join_count_and_design(sgcount, df_design)
Arguments
| sgcount | The input matrix contains read counts of sgRNAs for each sample. | 
| df_design | The table contains a study design. | 
Value
A tall-thin and combined table of the sgRNA read counts and study design will be returned.
Examples
library(CB2)
data(Evers_CRISPRn_RT112) 
head(join_count_and_design(Evers_CRISPRn_RT112$count, Evers_CRISPRn_RT112$design))
A function to perform gene-level test using a sgRNA-level statistics.
Description
A function to perform gene-level test using a sgRNA-level statistics.
Usage
measure_gene_stats(sgrna_stat, logFC_level = "sgRNA")
Arguments
| sgrna_stat | A data frame created by ‘measure_sgrna_stats’ | 
| logFC_level | The level of ‘logFC’ value. It can be ‘gene’ or ‘sgRNA’. | 
Value
A table contains the gene-level test result, and the table contains these columns:
- ‘gene’: Theg gene name to be tested. 
- ‘n_sgrna’: The number of sgRNA targets the gene in the library. 
- ‘cpm_a’: The mean of CPM of sgRNAs within the first group. 
- ‘cpm_b’: The mean of CPM of sgRNAs within the second group. 
- ‘logFC’: The log fold change of the gene between two groups. Taking the mean of sgRNA ‘logFC’s is default, and ‘logFC' is calculated by 'log2(cpm_b+1) - log2(cpm_a+1)’ if ‘logFC_level’ parameter is set to ‘gene’. 
- ‘p_ts’: The p-value indicates a difference between the two groups at the gene-level. 
- ‘p_pa’: The p-value indicates enrichment of the first group at the gene-level. 
- ‘p_pb’: The p-value indicates enrichment of the second group at the gene-level. 
- ‘fdr_ts’: The adjusted P-value of ‘p_ts’. 
- ‘fdr_pa’: The adjusted P-value of ‘p_pa’. 
- ‘fdr_pb’: The adjusted P-value of ‘p_pb’. 
Examples
data(Evers_CRISPRn_RT112)
measure_gene_stats(Evers_CRISPRn_RT112$sg_stat)
A function to perform a statistical test at a sgRNA-level
Description
A function to perform a statistical test at a sgRNA-level
Usage
measure_sgrna_stats(
  sgcount,
  design,
  group_a,
  group_b,
  delim = "_",
  ge_id = NULL,
  sg_id = NULL
)
Arguments
| sgcount | This data frame contains read counts of sgRNAs for the samples. | 
| design | This table contains study design. It has to contain 'group.' | 
| group_a | The first group to be tested. | 
| group_b | The second group to be tested. | 
| delim | The delimiter between a gene name and a sgRNA ID. It will be used if only rownames contains sgRNA ID. | 
| ge_id | The column name of the gene column. | 
| sg_id | The column/columns of sgRNA identifiers. | 
Value
A table contains the sgRNA-level test result, and the table contains these columns:
- ‘sgRNA’: The sgRNA identifier. 
- ‘gene’: The gene is the target of the sgRNA 
- ‘n_a’: The number of replicates of the first group. 
- ‘n_b’: The number of replicates of the second group. 
- ‘phat_a’: The proportion value of the sgRNA for the first group. 
- ‘phat_b’: The proportion value of the sgRNA for the second group. 
- ‘vhat_a’: The variance of the sgRNA for the first group. 
- ‘vhat_b’: The variance of the sgRNA for the second group. 
- ‘cpm_a’: The mean CPM of the sgRNA within the first group. 
- ‘cpm_b’: The mean CPM of the sgRNA within the second group. 
- ‘logFC’: The log fold change of sgRNA between two groups. 
- ‘t_value’: The value for the t-statistics. 
- ‘df’: The value of the degree of freedom, and will be used to calculate the p-value of the sgRNA. 
- ‘p_ts’: The p-value indicates a difference between the two groups. 
- ‘p_pa’: The p-value indicates enrichment of the first group. 
- ‘p_pb’: The p-value indicates enrichment of the second group. 
- ‘fdr_ts’: The adjusted P-value of ‘p_ts’. 
- ‘fdr_pa’: The adjusted P-value of ‘p_pa’. 
- ‘fdr_pb’: The adjusted P-value of ‘p_pb’. 
Examples
library(CB2)
data(Evers_CRISPRn_RT112)
measure_sgrna_stats(Evers_CRISPRn_RT112$count, Evers_CRISPRn_RT112$design, "before", "after")
A function to plot the first two principal components of samples.
Description
This function will perform a principal component analysis, and it returns a ggplot object of the PCA plot.
Usage
plot_PCA(sgcount, df_design)
Arguments
| sgcount | The input matrix contains read counts of sgRNAs for each sample. | 
| df_design | The table contains a study design. | 
Value
A ggplot2 object contains a PCA plot for the input.
library(CB2) data(Evers_CRISPRn_RT112) plot_PCA(Evers_CRISPRn_RT112$count, Evers_CRISPRn_RT112$design)
A function to show a heatmap sgRNA-level corrleations of the NGS samples.
Description
A function to show a heatmap sgRNA-level corrleations of the NGS samples.
Usage
plot_corr_heatmap(sgcount, df_design, cor_method = "pearson")
Arguments
| sgcount | The input matrix contains read counts of sgRNAs for each sample. | 
| df_design | The table contains a study design. | 
| cor_method | A string parameter of the correlation measure. One of the three - "pearson", "kendall", or "spearman" will be the string. | 
Value
A pheatmap object contains the correlation heatmap
library(CB2) data(Evers_CRISPRn_RT112) plot_corr_heatmap(Evers_CRISPRn_RT112$count, Evers_CRISPRn_RT112$design)
A function to plot read count distribution.
Description
A function to plot read count distribution.
Usage
plot_count_distribution(sgcount, df_design, add_dots = FALSE)
Arguments
| sgcount | The input matrix contains read counts of sgRNAs for each sample. | 
| df_design | The table contains a study design. | 
| add_dots | The function will display dots of sgRNA counts if it is set to 'TRUE'. | 
Value
A ggplot2 object contains a read count distribution plot for 'sgcount'.
Examples
library(CB2)
data(Evers_CRISPRn_RT112)
cpm <- get_CPM(Evers_CRISPRn_RT112$count)
plot_count_distribution(cpm, Evers_CRISPRn_RT112$design)
A function to visualize dot plots for a gene.
Description
A function to visualize dot plots for a gene.
Usage
plot_dotplot(sgcount, df_design, gene, ge_id = NULL, sg_id = NULL)
Arguments
| sgcount | The input matrix contains read counts of sgRNAs for each sample. | 
| df_design | The table contains a study design. | 
| gene | The gene to be shown. | 
| ge_id | A name of the column contains gene names. | 
| sg_id | A name of the column contains sgRNA IDs. | 
Value
A ggplot2 object contains dot plots of sgRNA read counts for a gene.
Examples
library(CB2)
data(Evers_CRISPRn_RT112)
plot_dotplot(get_CPM(Evers_CRISPRn_RT112$count), Evers_CRISPRn_RT112$design, "RPS7")
A C++ function to quantify sgRNA abundance from NGS samples.
Description
A C++ function to quantify sgRNA abundance from NGS samples.
Usage
quant(ref_path, fastq_path, verbose = FALSE)
Arguments
| ref_path | the path of the annotation file and it has to be a FASTA formatted file. | 
| fastq_path | a list of the FASTQ files. | 
| verbose | Display some logs during the quantification if it is set to 'true'. | 
Repeat a row
Description
Repeat a row
Usage
rep_row(x, n)
Arguments
| x | A vector to repeat | 
| n | Number of repetitions | 
Value
A matrix with 'n' copies of 'x'
A function to perform a statistical test at a sgRNA-level, deprecated.
Description
A function to perform a statistical test at a sgRNA-level, deprecated.
Usage
run_estimation(
  sgcount,
  design,
  group_a,
  group_b,
  delim = "_",
  ge_id = NULL,
  sg_id = NULL
)
Arguments
| sgcount | This data frame contains read counts of sgRNAs for the samples. | 
| design | This table contains study design. It has to contain 'group.' | 
| group_a | The first group to be tested. | 
| group_b | The second group to be tested. | 
| delim | The delimiter between a gene name and a sgRNA ID. It will be used if only rownames contains sgRNA ID. | 
| ge_id | The column name of the gene column. | 
| sg_id | The column/columns of sgRNA identifiers. | 
Value
A table contains the sgRNA-level test result, and the table contains these columns:
- ‘sgRNA’: The sgRNA identifier. 
- ‘gene’: The gene is the target of the sgRNA 
- ‘n_a’: The number of replicates of the first group. 
- ‘n_b’: The number of replicates of the second group. 
- ‘phat_a’: The proportion value of the sgRNA for the first group. 
- ‘phat_b’: The proportion value of the sgRNA for the second group. 
- ‘vhat_a’: The variance of the sgRNA for the first group. 
- ‘vhat_b’: The variance of the sgRNA for the second group. 
- ‘cpm_a’: The mean CPM of the sgRNA within the first group. 
- ‘cpm_b’: The mean CPM of the sgRNA within the second group. 
- ‘logFC’: The log fold change of sgRNA between two groups. 
- ‘t_value’: The value for the t-statistics. 
- ‘df’: The value of the degree of freedom, and will be used to calculate the p-value of the sgRNA. 
- ‘p_ts’: The p-value indicates a difference between the two groups. 
- ‘p_pa’: The p-value indicates enrichment of the first group. 
- ‘p_pb’: The p-value indicates enrichment of the second group. 
- ‘fdr_ts’: The adjusted P-value of ‘p_ts’. 
- ‘fdr_pa’: The adjusted P-value of ‘p_pa’. 
- ‘fdr_pb’: The adjusted P-value of ‘p_pb’. 
A function to run a sgRNA quantification algorithm from NGS sample
Description
A function to run a sgRNA quantification algorithm from NGS sample
Usage
run_sgrna_quant(lib_path, design, map_path = NULL, ncores = 1, verbose = FALSE)
Arguments
| lib_path | The path of the FASTA file. | 
| design | A table contains the study design. It must contain 'fastq_path' and 'sample_name.' | 
| map_path | The path of file contains gene-sgRNA mapping. | 
| ncores | The number that indicates how many processors will be used with a parallelization. The parallelization will be enabled if users do not set the parameter as '-1“ (it means the full physical cores will be used) or greater than '1'. | 
| verbose | Display some logs during the quantification if it is set to 'TRUE' | 
Value
It will return a list, and the list contains three elements. The first element (‘count’) is a data frame contains the result of the quantification for each sample. The second element (‘total’) is a numeric vector contains the total number of reads of each sample. The last element (‘sequence’) a data frame contains the sequence of each sgRNA in the library.
Examples
library(CB2)
library(magrittr)
library(tibble)
library(dplyr)
library(glue)
FASTA <- system.file("extdata", "toydata", "small_sample.fasta", package = "CB2")
ex_path <- system.file("extdata", "toydata", package = "CB2")
df_design <- tribble(
  ~group, ~sample_name,
  "Base", "Base1",  
  "Base", "Base2", 
  "High", "High1",
  "High", "High2") %>% 
    mutate(fastq_path = glue("{ex_path}/{sample_name}.fastq"))
cb2_count <- run_sgrna_quant(FASTA, df_design)