Type: Package
Title: Detection of Statistically Significant Combinations of SNPs in Association Mapping
Version: 0.6.1
Description: A significant pattern mining-based toolbox for region-based genome-wide association studies and higher-order epistasis analyses, implementing the methods described in Llinares-López et al. (2017) <doi:10.1093/bioinformatics/btx071>.
Depends: R (≥ 3.0.2)
Imports: methods, Rcpp
LinkingTo: Rcpp
Encoding: UTF-8
LazyData: true
License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]
NeedsCompilation: yes
RoxygenNote: 6.0.1
SystemRequirements: C++11
Suggests: testthat, knitr, rmarkdown
Author: Felipe Llinares-López [aut, cph], Laetitia Papaxanthos [aut, cph], Damian Roqueiro [aut, cph], Matthew Baker [ctr], Mikołaj Rybiński [ctr], Uwe Schmitt [ctr], Dean Bodenham [aut, cre, cph], Karsten Borgwardt [aut, fnd, cph]
Maintainer: Dean Bodenham <deanbodenhambsse@gmail.com>
VignetteBuilder: knitr
Packaged: 2020-05-04 18:15:36 UTC; dean
Repository: CRAN
Date/Publication: 2020-05-05 18:10:02 UTC

Constructor for CASMAP class object.

Description

Constructor for CASMAP class object.

Details

Constructor for CASMAP class object, which needs the mode parameter to be set by the user. Please see the examples.

Fields

mode

Either 'regionGWAS' or 'higherOrderEpistasis'.

alpha

A numeric value setting the Family-wise Error Rate (FWER). Must be strictly between 0 and 1. Default value is 0.05.

max_comb_size

A numeric specifying the maximum length of combinations. For example, if set to 4, then only combinations of size between 1 and 4 (inclusive) will be considered. To consider combinations of arbitrary (maximal) length, use value 0, which is the default value.

Base method, for both modes

readFiles

Read the data, label and possibly covariates files. Parameters are genotype_file, for the data, phenotype_file for the labels and (optional) covariates_file for the covariates. The option plink_file_root is not supported in the current version, but will be supported in future versions.

setMode

Can set/change the mode, but note that any data files will need to read in again using the readFiles command.

setTargetFWER

Can set/change the Family-wise Error Rate (FWER). Takes a numeric parameter alpha, strictly between 0 and 1.

execute

Once the data files have been read, can execute the algorithm. Please note that, depending on the size of the data files, this could take a long time.

getSummary

Returns a data frame with a summary of the results from the execution, but not any significant regions/itemsets. See getSignificantRegions, getSignificantInteractions, and getSignificantClusterRepresentatives.

writeSummary

Directly write the information from getSummary to file.

regionGWAS Methods

getSignificantRegions

Returns a data frame with the the significant regions. Only valid when mode='regionGWAS'.

getSignificantClusterRepresentatives

Returns a data frame with the the representatives of the significant clusters. This will be a subset of the regions returned from getSignificantRegions. Only valid when mode='regionGWAS'.

writeSignificantRegions

Writes the data from getSignificantRegions to file, which must be specified in the parameter path. Only valid when mode='regionGWAS'.

writeSignificantClusterRepresentatives

Writes the data from getSignificantClusterRepresentatives to file, which must be specified in the parameter path. Only valid when mode='regionGWAS'.

higherOrderEpistasis Methods

getSignificantInteractions

Returns the frame from getSignificantInteractions to file, which must be specified in the parameter path. Only valid when mode='higherOrderEpistasis'.

writeSignificantInteractions

Writes a data frame with the significant interactions. Only valid when mode='higherOrderEpistasis'.

References

A. Terada, M. Okada-Hatakeyama, K. Tsuda and J. Sese Statistical significance of combinatorial regulations, Proceedings of the National Academy of Sciences (2013) 110 (32): 12996-13001

F. Llinares-Lopez, D. G. Grimm, D. Bodenham, U. Gieraths, M. Sugiyama, B. Rowan and K. Borgwardt, Genome-wide detection of intervals of genetic heterogeneity associated with complex traits, ISMB 2015, Bioinformatics (2015) 31 (12): i240-i249

L. Papaxanthos, F. Llinares-Lopez, D. Bodenham, K .Borgwardt, Finding significant combinations of features in the presence of categorical covariates, Advances in Neural Information Processing Systems 29 (NIPS 2016), 2271-2279.

F. Llinares-Lopez, L. Papaxanthos, D. Bodenham, D. Roqueiro and K .Borgwardt, Genome-wide genetic heterogeneity discovery with categorical covariates. Bioinformatics 2017, 33 (12): 1820-1828.

Examples


## An example using the "regionGWAS" mode
fastcmh <- CASMAP(mode="regionGWAS")      # initialise object

datafile <- getExampleDataFilename()      # file name of example data
labelsfile <- getExampleLabelsFilename()  # file name of example labels
covfile <- getExampleCovariatesFilename() # file name of example covariates 

# read the data, labels and covariate files
fastcmh$readFiles(genotype_file=getExampleDataFilename(),
                  phenotype_file=getExampleLabelsFilename(), 
                  covariate_file=getExampleCovariatesFilename() )

# execute the algorithm (this may take some time)
fastcmh$execute()

#get the summary results
summary_results <- fastcmh$getSummary()

#get the significant regions
sig_regions <- fastcmh$getSignificantRegions()

#get the clustered representatives for the significant regions
sig_cluster_rep <- fastcmh$getSignificantClusterRepresentatives()


## Another example of regionGWAS
fais <- CASMAP(mode="regionGWAS")      # initialise object

# read the data and labels, but no covariates
fastcmh$readFiles(genotype_file=getExampleDataFilename(),
                  phenotype_file=getExampleLabelsFilename())


## Another example, doing higher order epistasis search
facs <- CASMAP(mode="higherOrderEpistasis")      # initialise object


Global variables environment

Description

An environment to store a few global variables. Internal.

Usage

CASMAPenv

Format

An object of class environment of length 3.


Approximate fast significant interval search

Description

Class for approximate significant intervals search with Tarone correction for bounding intermediate FWERs.


Internal class for search for significant regions

Description

Please use the CASMAP constructor.


Fast significant interval search with categorical covariates

Description

Internal class, please use CASMAP constructor.


Significant itemsets search with categorical covariates

Description

Internal class, please use CASMAP constructor.


Check if a variable is boolean or not

Description

Checks if a variable is boolean, if not throws an error, otherwise returns boolean.

Usage

checkIsBoolean(var, name)

Arguments

var

The variable to be checked (if boolean).

name

The name of the variable to appear in any error message.

Value

If not boolean (or NA), throws error. If NA, return FALSE. Otherwise return boolean value of var.


Get the path to the example covariates file for regionGWAS mode

Description

Path to CASMAP_example_covariates_1.txt in inst/extdata. The covariates categories for the data set CASMAP_example_data_1.txt, the path to which is given by getExampleDataFilename.

Usage

getExampleCovariatesFilename()

Format

A single column vector of 100 labels, each of which is 0 or 1 (same format as labels file).

Details

Path to the file containing the labels, for reading in to CASMAP object using the readFiles function.

See Also

getExampleDataFilename, getExampleLabelsFilename

Examples

covfile <- getExampleCovariatesFilename()

Get the path to the example data file for regionGWAS mode

Description

Path to CASMAP_example_data_1.txt in inst/extdata. A dataset containing binary samples for the regionGWAS method. There are accompanying labels and covariates dataset.

Usage

getExampleDataFilename()

Format

A matrix of 0s and 1s, with 1000 rows (features) and 100 columns (samples). In other words, each column is a sample, and each sample has 1000 binary features.

Details

Path to the file containing the data, for reading in to CASMAP object using the readFiles function. Note that the significant region is [99, 102].

See Also

getExampleLabelsFilename, getExampleCovariatesFilename

Examples

datafile <- getExampleDataFilename()

Get the path to the example labels file for regionGWAS mode

Description

Path to CASMAP_example_labels_1.txt in inst/extdata. A dataset containing the binary labels for the data in the file CASMAP_example_data_1.txt, the path to which is given by getExampleDataFilename.

Usage

getExampleLabelsFilename()

Format

A single column of 100 labels, each of which is either 0 or 1.

Details

Path to the file containing the labels, for reading in to CASMAP object using the readFiles function.

See Also

getExampleDataFilename, getExampleCovariatesFilename

Examples

labelsfile <- getExampleLabelsFilename()

Get the path to the example significant intervals file

Description

Path to CASMAP_example_covariates_1.txt in inst/extdata.

Usage

getExampleSignificantRegionsFilename()

Examples

sigregfile <- getExampleSignificantRegionsFilename()

Gets the higherOrderEpistasis string

Description

A getter for the global higherOrderEpistasis value, a string for the mode parameter.

Usage

getHigherOrderEpistasisString()

Gets the minModeLength

Description

A getter for the global minModeLength value, a string for the mode parameter.

Gets the minimum mode character length (should be 3)

Usage

getMinModeLength()

getMinModeLength()

Get the function name

Description

Uses match.call and as.character.

Usage

getParentFunctionName()

Gets the regionGWAS string

Description

A getter for the global regionGWAS value, a string for the mode parameter.

Usage

getRegionGWASString()

Checks if substring is part of higherOrderEpistasis

Description

Using grep to search through vector of strings

Usage

isHigherOrderEpistasisString(x)

Arguments

x

The string which will be compared to 'higherOrderEpistasis'

Details

Uses grep to search for exact match.

Value

TRUE if the string is a substring of 'higherOrderEpistasis', otherwise returns FALSE.


A method to check value is numeric and in open interval

Description

Checks if a value is numeric and strictly between two other values.

Usage

isInOpenInterval(x, lower = 0, upper = 1)

Arguments

x

Value to be checked. Needs to be numeric.

lower

Lower bound. Default value is 0.

upper

Upper bound. Default value is 1.

Value

If numeric, and strictly greater than lower and strictly smaller than upper, then return TRUE. Else return FALSE.


Checks if substring is part of regionGWAS

Description

Usinggrepl to compare strings, ignoring case.

Usage

isRegionGWASString(x)

Arguments

x

The string which will be compared to 'regionGWAS'

Details

Uses grepl to search for exact match. Case will be ignored.

Value

TRUE if the string is a substring of 'regionGWAS', otherwise returns FALSE.


Internal function

Description

Internal function

Usage

lib_delete_search_chi(inst)

Internal function

Description

Internal function

Usage

lib_delete_search_e(inst)

Internal function

Description

Internal function

Usage

lib_delete_search_facs(inst)

Internal function

Description

Internal function

Usage

lib_delete_search_fastcmh(inst)

Internal function

Description

Internal function

Usage

lib_execute_int(inst, alpha, l_max)

Internal function

Description

Internal function

Usage

lib_execute_iset(inst, alpha, l_max)

Internal function

Description

Internal function

Usage

lib_filter_intervals_write_to_file(inst, output_file)

Internal function

Description

Internal function

Usage

lib_get_filtered_intervals(inst)

Internal function

Description

Internal function

Usage

lib_get_result_facs(inst)

Internal function

Description

Internal function

Usage

lib_get_result_fais(inst)

Internal function

Description

Internal function

Usage

lib_get_result_int(inst)

Internal function

Description

Internal function

Usage

lib_get_result_iset(inst)

Internal function

Description

Internal function

Usage

lib_get_significant_intervals(inst)

Internal function

Description

Internal function

Usage

lib_get_significant_itemsets(inst)

Internal function

Description

Internal function

Usage

lib_new_search_chi()

Internal function

Description

Internal function

Usage

lib_new_search_e()

Internal function

Description

Internal function

Usage

lib_new_search_facs()

Internal function

Description

Internal function

Usage

lib_new_search_fastcmh()

Internal function

Description

Internal function

Usage

lib_profiler_write_to_file(inst, output_file)

Internal function

Description

Internal function

Usage

lib_pvals_significant_ints_write_to_file(inst, output_file)

Internal function

Description

Internal function

Usage

lib_pvals_significant_isets_write_to_file(inst, output_file)

Internal function

Description

Internal function

Usage

lib_pvals_testable_ints_write_to_file(inst, output_file)

Internal function

Description

Internal function

Usage

lib_pvals_testable_isets_write_to_file(inst, output_file)

Internal function

Description

Internal function

Usage

lib_read_covariates_file_facs(inst, cov_filename)

Internal function

Description

Internal function

Usage

lib_read_covariates_file_fastcmh(inst, cov_filename)

Internal function

Description

Internal function

Usage

lib_read_eth_files(inst, x_filename, y_filename, encoding)

Internal function

Description

Internal function

Usage

lib_read_eth_files_with_cov_facs(inst, x_filename, y_filename, covfilename,
  encoding)

Internal function

Description

Internal function

Usage

lib_read_eth_files_with_cov_fastcmh(inst, x_filename, y_filename, covfilename,
  encoding)

Description

Internal function

Usage

lib_read_plink_files(inst, base_filename, encoding)

Description

Internal function

Usage

lib_read_plink_files_with_cov_facs(inst, base_filename, covfilename, encoding)

Description

Internal function

Usage

lib_read_plink_files_with_cov_fastcmh(inst, base_filename, covfilename,
  encoding)

Internal function

Description

Internal function

Usage

lib_summary_write_to_file_facs(inst, output_file)

Internal function

Description

Internal function

Usage

lib_summary_write_to_file_fais(inst, output_file)

Internal function

Description

Internal function

Usage

lib_summary_write_to_file_fastcmh(inst, output_file)

Internal function

Description

Internal function

Usage

lib_write_eth_files_int(inst, x_filename, y_filename)

Internal function

Description

Internal function

Usage

lib_write_eth_files_iset(inst, x_filename, y_filename)

Internal function

Description

Internal function

Usage

lib_write_eth_files_with_cov_facs(inst, x_filename, y_filename, covfilename)

Internal function

Description

Internal function

Usage

lib_write_eth_files_with_cov_fastcmh(inst, x_filename, y_filename, covfilename)

Internal class

Description

in internal class


Internal class

Description

Internal class


Internal class

Description

An internal class


Internal class

Description

An internal class.


Internal class

Description

Internal class


Error message for mode

Description

Return the appropriate error message for incorrect mode input

Usage

modeErrorMessage()

Error message for mode, if too short

Description

Return the appropriate error message for incorrect mode input

Usage

modeLengthErrorMessage()

Checks mode string is long enough

Description

Checks mode string is at least minimum length

Usage

modeNeedsMoreChars(mode)