Title: | High-Dimensional Mean Comparison with Projection and Cross-Fitting |
Version: | 1.2 |
Date: | 2025-05-02 |
Description: | Provides interpretable high-dimensional mean comparison methods (HMC). For example, users can apply these methods to assess the difference in gene expression between two treatment groups. It is not a gene-by-gene comparison. Instead, the methods focus on the interplay between features and identify those that are predictive of the group label. The tests are valid frequentist procedures and yield sparse estimates indicating which features contribute to the group differences. |
License: | GPL-2 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Imports: | glmnet, irlba, PMA, MASS, stats, grpreg |
URL: | https://github.com/terrytianyuzhang/HMC/tree/main/HMC_package |
NeedsCompilation: | no |
Packaged: | 2025-05-02 16:41:07 UTC; tianyuzhang |
Author: | Tianyu Zhang [aut, cre, cph] |
Maintainer: | Tianyu Zhang <tianyuz3@andrew.cmu.edu> |
Repository: | CRAN |
Date/Publication: | 2025-05-02 17:00:02 UTC |
Anchored test for two-sample mean comparison.
Description
Anchored test for two-sample mean comparison.
Usage
anchored_lasso_testing(
sample_1,
sample_2,
pca_method = "sparse_pca",
mean_method = "lasso",
lasso_tuning_method = "min",
num_latent_factor = 1,
n_folds = 5,
verbose = TRUE
)
Arguments
sample_1 |
Group 1 sample. Each row is a subject and each column corresponds to a feature. |
sample_2 |
Group 2 sample. Each row is a subject and each column corresponds to a feature. |
pca_method |
Methods used to estimate principle component The default is "sparse_pca", using sparse PCA from package PMA. Other choices are "dense_pca"—the regular PCA; and "hard"— hard-thresholding PCA, which also induces sparsity. |
mean_method |
Methods used to estimate the discriminant direction. Default is logistic Lasso "lasso". Can also take value "lasso_no_truncation" |
lasso_tuning_method |
Method for Lasso penalty hyperparameter tuning. Default is "min", the minimizer of cross-validation error; users can also use "1se" for more sparse solutions. |
num_latent_factor |
The principle component that lasso coefficient anchors at. The default is PC1 = 1. |
n_folds |
Number of splits when performing cross-fitting. The default is 5, if computational time allows, you can try to set it to 10. |
verbose |
Print information to the console. Default is TRUE. |
Value
A list of test statistics.
test_statistics |
Test statistics. Each entry corresponds to the test result of one principle component. |
standard_error |
Estimated standard error of test_statistics_before_studentization. |
test_statistics_before_studentization |
Similar to test_statistics but does not have variance = 1. |
split_data |
Intermediate quantities needed for further assessment and interpretation of the test results. |
Examples
sample_size_1 <- sample_size_2 <- 300
true_mean_1 <- matrix(c(rep(1, 10), rep(0, 90)), ncol = 1)
true_mean_2 <- matrix(c(rep(1.5, 10), rep(0, 90)), ncol = 1)
sample_1 <- data.frame(MASS::mvrnorm(sample_size_1,
mu = true_mean_1,
Sigma = diag(1, 100)))
sample_2 <- data.frame(MASS::mvrnorm(sample_size_2,
mu = true_mean_2,
Sigma = diag(1, 100)))
result <- anchored_lasso_testing(sample_1, sample_2)
result$test_statistics
##the test statistic. It should follow normal(0,1) when there is no difference between the groups.
summarize_feature_name(result)
#summarize which features contribute to discriminant vectors (i.e. logistic lasso)
extract_pc(result) # extract the estimated discriminant coefficients
Check that data has enough rows for cross-validation folds
Description
Validates that the input data has at least as many rows as the number of desired folds.
Usage
check_data_for_folds(data, n_folds)
Arguments
data |
A data frame or matrix. |
n_folds |
Integer. The number of folds to check for. |
Value
NULL (called for its side effect). Throws an error if the number of rows is too small.
Examples
check_data_for_folds(matrix(1:20, nrow = 5), n_folds = 5)
## Not run:
check_data_for_folds(matrix(1:4, nrow = 2), n_folds = 5) # This will throw an error
## End(Not run)
Check non-null and consistent column names across datasets
Description
Ensures all input datasets have non-null, non-empty, and identical column names.
Usage
check_non_null_and_identical_colnames(data_list)
Arguments
data_list |
A list of matrices or data frames to be checked. |
Value
NULL (called for side-effect). Throws an error if validation fails.
Examples
d1 <- data.frame(a = 1:2, b = 3:4)
d2 <- data.frame(a = 5:6, b = 7:8)
check_non_null_and_identical_colnames(list(d1, d2))
Collect active features and groups based on projection directions
Description
Identifies consistently non-zero features across cross-validation folds using a voting scheme and returns active groups if a grouping vector is provided.
Usage
collect_active_features_proj(
test_result,
voting_method = c("majority_voting"),
group = NULL,
group_threshold = 1
)
Arguments
test_result |
A result object from |
voting_method |
Character. Method to determine active features. Only |
group |
Optional grouping vector with feature names. Must match the feature dimension of |
group_threshold |
Integer. Minimum number of active features required to declare a group active. Default is 1. |
Value
If group
is provided, returns a list with:
- active_features
Character vector of consistently non-zero features.
- active_groups
Character vector of active groups.
If group
is NULL, returns a character vector of active features only.
Combine fold-level test statistics from cross-validation
Description
Aggregates fold-level test statistics and variances to compute an overall test statistic and p-value.
Usage
combine_folds_mean_diff(fold_data, verbose = FALSE)
Arguments
fold_data |
A list of results from |
verbose |
Logical. Whether to print diagnostic messages. Default is FALSE. |
Value
A list containing:
- p_value
Two-sided p-value for the overall test statistic.
- test_statistic
Standardized test statistic.
- fold_data
Original input list, for reference or diagnostics.
Compute predictive contributions of feature groups
Description
Analyzes the relative contribution of grouped features to the overall discriminant signal, based on averaged Lasso coefficients across cross-validation folds.
Usage
compute_predictive_contributions(result, group, group_threshold = 5)
Arguments
result |
A result object returned by |
group |
A grouping vector indicating group membership of features. Must be the same length as the number of features. |
group_threshold |
Integer. Minimum number of active features required in a group for it to be considered active. Default is 5. |
Details
The function identifies active groups based on cross-validated non-zero coefficients, then decomposes the total L2 norm of the average coefficient vector across groups.
Value
A data frame with two columns:
- group
Group name or label.
- score
Proportion of total predictive signal attributable to that group.
See Also
Debiased one-step test for two-sample mean comparison. A small p-value tells us not only there is difference in the mean vectors, but can also indicates which principle component the difference aligns with.
Description
Debiased one-step test for two-sample mean comparison. A small p-value tells us not only there is difference in the mean vectors, but can also indicates which principle component the difference aligns with.
Usage
debiased_pc_testing(
sample_1,
sample_2 = NULL,
pca_method = "sparse_pca",
mean_method = "naive",
num_latent_factor = 1,
n_folds = 5,
verbose = TRUE
)
Arguments
sample_1 |
Group 1 sample. Each row is a subject and each column corresponds to a feature. |
sample_2 |
Group 2 sample. Each row is a subject and each column corresponds to a feature. |
pca_method |
Methods used to estimate principle component The default is "sparse_pca", using sparse PCA from package PMA. Other choices are "dense_pca"—the regular PCA; and "hard"— hard-thresholding PCA, which also induces sparsity. |
mean_method |
Methods used to estimate the mean vector. Default is sample mean "naive". There is also a hard-thresholding sparse estiamtor "hard". |
num_latent_factor |
Number of principle to be estimated/tested. Default is 1. |
n_folds |
Number of splits when performing cross-fitting. The default is 5, if computational time allows, you can try to set it to 10. |
verbose |
Print information to the console. Default is TRUE. |
Value
A list of test statistics.
test_statistics |
Test statistics. Each entry corresponds to the test result of one principle component. |
standard_error |
Estimated standard error of test_statistics_before_studentization. |
test_statistics_before_studentization |
Similar to test_statistics but does not have variance = 1. |
split_data |
Intermediate quantities needed for further assessment and interpretation of the test results. |
Examples
sample_size_1 <- sample_size_2 <- 300
true_mean_1 <- matrix(c(rep(1, 10), rep(0, 90)), ncol = 1)
true_mean_2 <- matrix(c(rep(1.5, 10), rep(0, 90)), ncol = 1)
pc1 <- c(rep(1, 10), rep(0, 90))
pc1 <- pc1/norm(pc1, type = '2')
simulation_covariance <- 10 * pc1 %*% t(pc1)
simulation_covariance <- simulation_covariance + diag(1, 100)
sample_1 <- data.frame(MASS::mvrnorm(sample_size_1,
mu = true_mean_1,
Sigma = simulation_covariance))
sample_2 <- data.frame(MASS::mvrnorm(sample_size_2,
mu = true_mean_2,
Sigma = simulation_covariance))
result <- debiased_pc_testing(sample_1, sample_2)
result$test_statistics
##these are test statistics. Each one of them corresponds to one PC.
summarize_pc_name(result, latent_fator_index = 1) #shows which features contribute to PC1
extract_pc(result) # extract the estimated leading PCs.
Estimate the leading principal component
Description
Estimates the leading principal component of the input matrix using dense or sparse PCA.
Usage
estimate_leading_pc(control, pca_method = c("dense_pca", "sparse_pca"))
Arguments
control |
A matrix or data frame. Each row is a sample, and each column is a feature. |
pca_method |
Character. PCA method to use. Options are |
Details
For low-dimensional settings (\le
30 features), the method automatically switches to dense PCA.
For sparse PCA, the function uses the PMA::SPC.cv
cross-validation method.
Value
A normalized numeric vector representing the leading principal component direction.
Examples
## Not run:
X <- matrix(rnorm(100), nrow = 20)
estimate_leading_pc(X, pca_method = "dense_pca")
## End(Not run)
The function for nuisance parameter estimation in anchored_lasso_testing().
Description
The function for nuisance parameter estimation in anchored_lasso_testing().
Usage
estimate_nuisance_parameter_lasso(
nuisance_sample_1,
nuisance_sample_2,
pca_method = "sparse_pca",
mean_method = "lasso",
lasso_tuning_method = "min",
num_latent_factor = 1,
local_environment = local_environment,
verbose = TRUE
)
Arguments
nuisance_sample_1 |
Group 1 sample. Each row is a subject and each column corresponds to a feature. |
nuisance_sample_2 |
Group 2 sample. Each row is a subject and each column corresponds to a feature. |
pca_method |
Methods used to estimate principle component The default is "sparse_pca", using sparse PCA from package PMA. Other choices are "dense_pca"—the regular PCA; and "hard"— hard-thresholding PCA, which also induces sparsity. |
mean_method |
Methods used to estimate the discriminant direction. Default is logistic Lasso "lasso". Can also take value "lasso_no_truncation" |
lasso_tuning_method |
Method for Lasso penalty hyperparameter tuning. Default is "min", the minimizer of cross-validation error; users can also use "1se" for more sparse solutions. |
num_latent_factor |
The principle component that lasso coefficient anchors at. The default is PC1 = 1. |
local_environment |
An environment for hyperparameters shared between folds. |
verbose |
Print information to the console. Default is TRUE. |
Value
A list of estimated nuisance quantities.
estimate_leading_pc |
Leading principle components |
estimate_mean_1 |
Sample mean for group 1 |
estimate_mean_2 |
Sample mean for group 1 |
estimate_lasso_beta |
Logistic Lasso regression coefficients. |
estimate_projection_direction |
Anchored projection direction. It is similar to PC1 when signal is weak but similar to estimate_optimal_direction when the signal is moderately large. |
estimate_optimal_direction |
Discriminant direction. |
The function for nuisance parameter estimation in simple_pc_testing() and debiased_pc_testing().
Description
The function for nuisance parameter estimation in simple_pc_testing() and debiased_pc_testing().
Usage
estimate_nuisance_pc(
nuisance_sample_1,
nuisance_sample_2 = NULL,
pca_method = "sparse_pca",
mean_method = "naive",
num_latent_factor = 1,
local_environment = NA
)
Arguments
nuisance_sample_1 |
Group 1 sample. Each row is a subject and each column corresponds to a feature. |
nuisance_sample_2 |
Group 2 sample. Each row is a subject and each column corresponds to a feature. |
pca_method |
Methods used to estimate principle component The default is "sparse_pca", using sparse PCA from package PMA. Other choices are "dense_pca"—the regular PCA; and "hard"— hard-thresholding PCA, which also induces sparsity. |
mean_method |
Methods used to estimate the mean vector. Default is sample mean "naive". There is also a hard-thresholding sparse estiamtor "hard". |
num_latent_factor |
Number of principle to be estimated/tested. Default is 1. |
local_environment |
A environment for hyperparameters shared between folds. |
Value
A list of estimated nuisance quantities.
estimate_leading_pc |
Leading principle components |
estimate_mean_1 |
Sample mean for group 1 |
estimate_mean_2 |
Sample mean for group 1 |
estimate_eigenvalue |
Eigenvalue for each principle compoenent. |
estimate_noise_variance |
Noise variance, I need this to construct block-diagonal estimates of the covariance matrix. |
Calculate the test statistics on the left-out samples. Called in debiased_pc_testing().
Description
Calculate the test statistics on the left-out samples. Called in debiased_pc_testing().
Usage
evaluate_influence_function_multi_factor(
cross_fitting_sample_1,
cross_fitting_sample_2 = NULL,
nuisance_collection,
num_latent_factor = 1
)
Arguments
cross_fitting_sample_1 |
Group 1 sample. Each row is a subject and each column corresponds to a feature. |
cross_fitting_sample_2 |
Group 2 sample. Each row is a subject and each column corresponds to a feature. |
nuisance_collection |
A collection of nuisance quantities estimated using "nuisance" samples. It is the output of estimate_nuisance_pc(). |
num_latent_factor |
Number of principle components to be considered. |
Value
A list of test statistics.
inner_product_1 |
Simple inner products for sample 1. |
inner_product_2 |
Simple inner products for sample 2. |
influence_eigenvector_each_subject_1 |
Debiased test statistics, sample 1. |
influence_eigenvector_each_subject_2 |
Debiased test statistics, sample 1. |
for_variance_subject_1 |
Statistics for variance calculation, sample 1. |
for_variance_subject_2 |
Statistics for variance calculation, sample 2. |
Calculate the test statistics on the left-out samples. Called in anchored_lasso_testing().
Description
Calculate the test statistics on the left-out samples. Called in anchored_lasso_testing().
Usage
evaluate_pca_lasso_plug_in(
cross_fitting_sample_1,
cross_fitting_sample_2,
nuisance_collection,
mean_method = "lasso"
)
Arguments
cross_fitting_sample_1 |
Group 1 sample. Each row is a subject and each column corresponds to a feature. |
cross_fitting_sample_2 |
Group 2 sample. Each row is a subject and each column corresponds to a feature. |
nuisance_collection |
A collection of nuisance quantities estimated using "nuisance" samples. It is the output of estimate_nuisance_pc(). |
mean_method |
Methods used to estimate the discriminant direction. Default is logistic Lasso "lasso". Can also take value "lasso_no_truncation" |
Value
A list of test statistics.
influence_each_subject_1 |
Test statistics for sample 1. |
influence_each_subject_1 |
Test statistics for sample 2. |
for_variance_each_subject_1 |
Statistics for variance calculation, sample 1. |
for_variance_each_subject_2 |
Statistics for variance calculation, sample 2. |
Calculate the test statistics on the left-out samples. Called in simple_pc_testing().
Description
Calculate the test statistics on the left-out samples. Called in simple_pc_testing().
Usage
evaluate_pca_plug_in(
cross_fitting_sample_1,
cross_fitting_sample_2 = NULL,
nuisance_collection
)
Arguments
cross_fitting_sample_1 |
Group 1 sample. Each row is a subject and each column corresponds to a feature. |
cross_fitting_sample_2 |
Group 2 sample. Each row is a subject and each column corresponds to a feature. |
nuisance_collection |
A collection of nuisance quantities estimated using "nuisance" samples. It is the output of estimate_nuisance_pc(). |
Value
A list of test statistics.
influence_each_subject_1 |
Statistics for sample 1. |
influence_each_subject_2 |
Statistics for sample 2. |
Extract the lasso estimate from the output of anchored_lasso_testing().
Description
Extract the lasso estimate from the output of anchored_lasso_testing().
Usage
extract_lasso_coef(testing_result)
Arguments
testing_result |
The output/test result list from anchored_lasso_testing(). |
Value
A list, whose elements are the estimated discriminant directions for each split—the length of the output list is the same as n_folds.
The discriminant vectors for each split.
Extract the principle components from the output of simple_pc_testing() and debiased_pc_testing().
Description
Extract the principle components from the output of simple_pc_testing() and debiased_pc_testing().
Usage
extract_pc(testing_result)
Arguments
testing_result |
The output/test result list from simple_pc_testing() or debiased_pc_testing(). |
Value
A list, whose elements are the estimated PC for each split—the length of the output list is the same as n_folds.
The PC vectors for each split.
Fit a (group) Lasso logistic regression classifier
Description
Performs Lasso or group Lasso logistic regression to distinguish between two groups of samples.
Usage
fit_lasso(
control_train,
treat_train,
lambda_type = c("lambda.min", "lambda.1se"),
classifier_method = c("lasso", "group_lasso"),
group = NULL
)
Arguments
control_train |
A matrix or data frame for the control group. Rows are samples, columns are features. |
treat_train |
A matrix or data frame for the treatment group. Rows are samples, columns are features. |
lambda_type |
Character. Type of lambda to use from cross-validation. Options are |
classifier_method |
Character. Choice of classifier. |
group |
Optional grouping vector for |
Details
The function fits a logistic regression using either glmnet
for Lasso or grpreg
for group Lasso.
Coefficients are soft-thresholded by the maximum coefficient times n^(-1/3)
where n
is the effective sample size.
Value
A numeric vector of estimated regression coefficients (excluding intercept), thresholded for small values.
Examples
## Not run:
X1 <- matrix(rnorm(100), nrow = 10)
X2 <- matrix(rnorm(100), nrow = 10)
fit_lasso(X1, X2, classifier_method = "lasso")
## End(Not run)
Split indices into folds
Description
Randomly splits a given vector of indices into approximately equal-sized folds.
Usage
index_spliter(array, n_folds = 5)
Arguments
array |
A vector of indices (e.g., |
n_folds |
Integer. Number of folds. Default is 5. |
Value
A list of length n_folds
, each containing a subset of the shuffled indices.
Examples
index_spliter(1:10, n_folds = 3)
High-dimensional two-sample mean comparison with anchored projection
Description
Performs a cross-validated, projection-based mean comparison between two high-dimensional groups using sparse or dense PCA and (group) Lasso classifiers.
Usage
mean_comparison_anchor(
control,
treatment,
pca_method = c("dense_pca", "sparse_pca"),
classifier_method = c("lasso", "group_lasso"),
lambda_type = "lambda.1se",
n_folds = 10,
group = NULL,
standardize_feature = TRUE,
verbose = TRUE
)
Arguments
control |
A matrix or data frame for the control group. Rows are samples; columns are features. |
treatment |
A matrix or data frame for the treatment group. Rows are samples; columns are features. |
pca_method |
Character. Method for estimating the projection direction. Options are |
classifier_method |
Character. Classifier to guide the projection. Options are |
lambda_type |
Character. Regularization parameter choice in Lasso. Options are |
n_folds |
Integer. Number of cross-validation folds. Default is 10. |
group |
Optional. A grouping vector (required for |
standardize_feature |
Logical. Whether to standardize features using pooled mean and standard deviation. Default is TRUE. |
verbose |
Logical. Whether to print messages during execution. Default is TRUE. |
Details
This function applies a projection-based method for high-dimensional mean testing. The projection direction is computed by anchoring the leading principal component with a regularized classifier (Lasso or group Lasso), and test statistics are aggregated across folds.
Value
A list with:
- p_value
Two-sided p-value for the overall test.
- test_statistic
Standardized test statistic.
- fold_data
Per-fold results, including projections and scores.
See Also
process_fold_mean_diff
, combine_folds_mean_diff
, estimate_leading_pc
, fit_lasso
Examples
## Not run:
X <- matrix(rnorm(200 * 100), nrow = 100)
Y <- matrix(rnorm(200 * 100), nrow = 100)
result <- mean_comparison_anchor(X, Y, pca_method = "dense_pca", classifier_method = "lasso")
## End(Not run)
Normalize and split two datasets using pooled mean and standard deviation
Description
Combines two datasets, normalizes features using pooled mean and standard deviation, and returns the normalized datasets separately.
Usage
normalize_and_split(df1, df2)
Arguments
df1 |
A data frame or matrix. Typically group 1. |
df2 |
A data frame or matrix. Typically group 2. |
Value
A list with elements:
- df1
Normalized version of
df1
.- df2
Normalized version of
df2
.
Examples
set.seed(123)
df1 <- matrix(rnorm(20), nrow = 5)
df2 <- matrix(rnorm(20), nrow = 5)
normalize_and_split(df1, df2)
Process one cross-validation fold for mean difference testing
Description
Computes the test statistic, variance, and projection direction for one fold in a cross-validated comparison of two groups.
Usage
process_fold_mean_diff(
fold_index,
control,
treatment,
control_split_index,
tr_split_index,
pca_method,
classifier_method,
lambda_type,
group,
verbose
)
Arguments
fold_index |
Integer index of the current fold. |
control |
Matrix or data frame for the control group (rows = samples, columns = features). |
treatment |
Matrix or data frame for the treatment group (rows = samples, columns = features). |
control_split_index |
A list of row indices for each fold of the control group. |
tr_split_index |
A list of row indices for each fold of the treatment group. |
pca_method |
Character. PCA method to use. Options are |
classifier_method |
Character. Classifier method. Options are |
lambda_type |
Character. Lambda selection method. Options are |
group |
Optional grouping vector for group lasso. |
verbose |
Logical. Whether to print progress messages. |
Value
A list containing the test statistic, its variance, scores for each group, the projection direction, and intermediate quantities.
Simple plug-in test for two-sample mean comparison.
Description
Simple plug-in test for two-sample mean comparison.
Usage
simple_pc_testing(
sample_1,
sample_2 = NULL,
pca_method = "sparse_pca",
mean_method = "naive",
num_latent_factor = 1,
n_folds = 5,
verbose = TRUE
)
Arguments
sample_1 |
Group 1 sample. Each row is a subject and each column corresponds to a feature. |
sample_2 |
Group 2 sample. Each row is a subject and each column corresponds to a feature. |
pca_method |
Methods used to estimate principle component The default is "sparse_pca", using sparse PCA from package PMA. Other choices are "dense_pca"—the regular PCA; and "hard"— hard-thresholding PCA, which also induces sparsity. |
mean_method |
Methods used to estimate the mean vector. Default is sample mean "naive". There is also a hard-thresholding sparse estiamtor "hard". |
num_latent_factor |
Number of principle to be estimated/tested. Default is 1. |
n_folds |
Number of splits when performing cross-fitting. The default is 5, if computational time allows, you can try to set it to 10. |
verbose |
Print information to the console. Default is TRUE. |
Value
A list of test statistics.
test_statistics |
Test statistics. Each entry corresponds to the test result of one principle component. |
standard_error |
Estimated standard error of test_statistics_before_studentization. |
test_statistics_before_studentization |
Similar to test_statistics but does not have variance = 1. |
split_data |
Intermediate quantities needed for further assessment and interpretation of the test results. |
Examples
sample_size_1 <- sample_size_2 <- 300
true_mean_1 <- matrix(c(rep(1, 10), rep(0, 90)), ncol = 1)
true_mean_2 <- matrix(c(rep(1.5, 10), rep(0, 90)), ncol = 1)
pc1 <- c(rep(1, 10), rep(0, 90))
pc1 <- pc1/norm(pc1, type = '2')
simulation_covariance <- 10 * pc1 %*% t(pc1)
simulation_covariance <- simulation_covariance + diag(1, 100)
sample_1 <- data.frame(MASS::mvrnorm(sample_size_1,
mu = true_mean_1,
Sigma = simulation_covariance))
sample_2 <- data.frame(MASS::mvrnorm(sample_size_2,
mu = true_mean_2,
Sigma = simulation_covariance))
result <- simple_pc_testing(sample_1, sample_2)
result$test_statistics
##these are test statistics. Each one of them corresponds to one PC.
summarize_pc_name(result, latent_fator_index = 1) #shows which features contribute to PC1
extract_pc(result) # extract the estimated leading PCs.
Summarize the features (e.g. genes) that contribute to the test result, i.e. those features consistently show up in Lasso vectors.
Description
Summarize the features (e.g. genes) that contribute to the test result, i.e. those features consistently show up in Lasso vectors.
Usage
summarize_feature_name(testing_result, method = "majority voting")
Arguments
testing_result |
The output/test result list from anchored_lasso_testing(). |
method |
How to combine the feature list across different splits. Default is 'majority voting'—features that show up more than 50% of the splits are considered active/useful. It can be 'union'—all the features pooled together; or 'intersection'—only include features showing up in all splits. |
Value
A list of names of features (your very original input data need to have column names!) that contribute to the test result. An empty list means there is barely any difference between the two groups.
Feature names that consistently showing up in the discriminant vectors.
Summarize the features (e.g. genes) that contribute to the test result, i.e. those features consistently show up in the sparse principle components.
Description
Summarize the features (e.g. genes) that contribute to the test result, i.e. those features consistently show up in the sparse principle components.
Usage
summarize_pc_name(
testing_result,
latent_fator_index = 1,
method = "majority voting"
)
Arguments
testing_result |
The output/test result list from simple_pc_testing() or debiased_pc_testing(). |
latent_fator_index |
Which principle component should the algorithm summarize? Default is PC1. |
method |
How to combine the feature list across different splits. Default is 'majority voting'—features that show up more than 50% of the splits are considered active/useful. It can be 'union'—all the features pooled together; or 'intersection'—only include features showing up in all splits. |
Value
A list of names of features (your very original input data need to have column names!) that contribute to the test result.
Feature names that consistently showing up in the estimated PC vectors.
Validate and convert input data
Description
Checks whether the input is a matrix or data frame, and converts it to a matrix if valid.
Usage
validate_and_convert_data(data, name)
Arguments
data |
A matrix or data frame. |
name |
A string used in error messages to identify the variable name. |
Value
A numeric matrix.
Examples
validate_and_convert_data(data.frame(x = 1:3, y = 4:6), "example_data")