Help for package Ghat

Title:

Quantifying Evolution and Selection on Complex Traits

Version:

0.2.0

Description:

Functions are provided for quantifying evolution and selection on complex traits. The package implements effective handling and analysis algorithms scaled for genome-wide data and calculates a composite statistic, denoted Ghat, which is used to test for selection on a trait. The package provides a number of simple examples for handling and analysing the genome data and visualising the output and results. Beissinger et al., (2018) <doi:10.1534/genetics.118.300857>.

License:

MIT + file LICENSE

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

6.1.1

Depends:

R (≥ 3.0.0)

URL:

https://academic.oup.com/genetics/article/209/1/321/5931021

BugReports:

https://github.com/Medhat86/Ghat/issues

Suggests:

knitr, rmarkdown

Imports:

rrBLUP

NeedsCompilation:

Packaged:

2022-11-25 10:46:34 UTC; beissinger

Author:

Medhat Mahmoud [aut], Ngoc-Thuy Ha [aut], Tim Beissinger [aut, cre]

Maintainer:

Tim Beissinger <timbeissinger@gmail.com>

Repository:

CRAN

Date/Publication:

2022-12-14 12:00:08 UTC

Quantifying evolution and selection on complex traits

Description

G-hat: R function to estimate G-hat from allele frequency and effect size data.

Usage

Ghat(effects = effects, change = change, method = "scale",
  perms = 1000, plot = "Both", blockSize = 1000, num_eff = NULL)

Arguments

effects

Vector of allele effects.

change

Vector of changes in allele frequency (could be positive, negative or zero).

method

"vanilla" (assumes complete linkage equilibrium between markers), "trim" (excludes markers to approximate linkage equilibrium some of the extreme values) or "scale" (scales results to reflect underlying levels of linkage LD)

perms

Number of permutations to run.

plot

"Ghat", "Cor", or "Both", Should a plot of the Ghat or correlation test be returned?

blockSize

How large should blocks for trimming be? Only required if method = "trim".

num_eff

The effective number of independent markers, to be used only in conjunction with the “scale” method, above (see “ld_decay” function or use help (?ld_decay).

Value

Ghat Ghat-value

Cor Correlation between alleles frequencies and their effects

p.val two-sided P-value of Evidence of selection

plot relationship between estimated allelic effects at individual SNPs and the change in allele frequency over generations

Examples

#Example-1 Both SNP effects and change in allele frequency are known
maize		<- Maize_wqs[[1]]
result.adf	<- Ghat(effects =maize[,1], change=maize[,2], method="scale",
                     perms=1000, plot="Ghat", num_eff=54.74819)
mtext(paste("WQS ADF test for selection, pval = ", round(result.adf$p.val,4)))
message (c(result.adf$Ghat , result.adf$Cor , result.adf$p.va))


## Not run: 
#Example-2 Both SNP effects and change in allele frequency are known
##################################################################
## step 1: #run rrBLUP and estimating allels effects            ##
##################################################################

library(Ghat)
library(parallel)
library(rrBLUP)
phe                 <- Maize_wqs[[2]]
map                 <- Maize_wqs[[3]]
gen                 <- Maize_wqs[[4]]
phe                 <-phe[which(is.na(phe[,2])==FALSE),]
gen                 <-gen[which(is.na(phe[,2])==FALSE),]
result              <- mixed.solve(phe[,2],
                                   Z= as.matrix(gen[,2:ncol(gen)]),
                                   X= model.matrix(phe[,2]~phe[,3]),
                                   K=NULL, SE=FALSE, return.Hinv=FALSE,
                                   method="ML")
                                   
##################################################################
## step 2: is to calculate the allele frequency at Cycle 1 and 3##
##################################################################
CycleIndicator      <- as.numeric(unlist(strsplit(gen$X,
                       split="_C")) [seq(2,2*nrow(gen),2)])
Cycle1              <- gen[which(CycleIndicator == 1),]
Cycle3              <- gen[which(CycleIndicator == 3),]
CycleList           <- list(Cycle1,Cycle3)
frequencies         <- matrix(nrow=ncol(gen)-1,ncol=2)
for(i in 1:2){
  frequencies[,i]   <- colMeans(CycleList[[i]][,-1],na.rm=TRUE)/2
}
frequencies         <- as.data.frame(frequencies)
names(frequencies)  <- c("Cycle1","Cycle3")
change<-frequencies$Cycle3-frequencies$Cycle1

################################################################
## step 3: Calculate LD Decay                                   ##
################################################################
ld                  <- ld_decay (gen=gen, map=map,
                                 max_win_snp=2000, max.chr=10,
                                 cores=1, max_r2=0.03)

##################################################################
## step 4: Calculate Ghat                                       ##
##################################################################
Ghat.adf    <- Ghat(effects=result$u, change=change, method = "scale",
                    perms=1000,plot="Ghat", num_eff = 54.74819)

message (paste("Ghat=" , Ghat.adf$Ghat,
            "Cor="  , Ghat.adf$Cor ,
            "P-val=", Ghat.adf$p.va, sep = " "))

## End(Not run)

The Wisconsin Quality Synthetic (WQS) maize population datasets.

Description

The Wisconsin Quality Synthetic (WQS) maize population datasets.

Usage

Maize_wqs

Format

A list of 4 data sets:

Dataset-1: Data frame including all SNP effects and changes in allele frequencies between cycle 2 and 5.
Dataset-2: BLUP breeding values for Acid Detergent Fiber (ADF) involving 5 generations of selection.
Dataset-3: Map file: Each line of the Map file describes a single marker and must contain at least three columns. 1: chromosome number; 2: SNP (snp id); 3: SNP position (in base-pairs (bp)).
Dataset-4: Maize Genotype (Illumina MaizeSNP50 BeadChip); an Infinium HD assay (Illumina, Inc. San Diego, CA). 10,017 SNP markers (0,1 and 2) after filtration, distributed across the maize genome (Ganal et al.2011).

Source

Lorenz, A. J., Beissinger, T.M., Rodrigues, R., de Leon, N. 2015. Selection for silage yield and composition did not affect genomic diversity within the Wisconsin Quality Synthetic maize population. Genes Genomes Genetics. DOI: 10.1534/g3.114.015263.

Examples

maize <- Maize_wqs[[1]]
phe   <- Maize_wqs[[2]]
map   <- Maize_wqs[[3]]
gen   <- Maize_wqs[[4]]

Evaluation of Linkage Disequilibrium Decay

Description

ld_decay: R function for calculating the effective number of independent markers

Usage

ld_decay(gen = gen, map = map, max_win_snp = 2000,
  max.chr = max.chr, cores = 1, max_r2 = max_r2)

Arguments

gen

Matrix of genotype data. Individuals in rows, genotypes (0, 1, 2) in columns.

map

Dataframe inculding the name for each marker with a corresponding chromosome number and physical position.

max_win_snp

The maximum number of markers in each window. Sets the maximum number of markers allowed per window within a chromosome before estimating the LD. Default is 2000.

max.chr

Chromosomes above this number will be excluded from the analysis.

cores

Numer of cores for using parallelized calculation, Default is 1 for windows machine.

max_r2

the threshold of r^2 to calculate the effective number of independent markers.

Value

cor: Correlation matrix

ch_eff_nmark: The Number of independent marker per chromosome

eff_nmark: The effective number of independent markers

Examples

## Not run: 
library("parallel")
gen         <- Maize_wqs[[4]]
map         <- Maize_wqs[[3]]
Res_ld <- ld_decay (gen=gen, map=map, max_win_snp=2000,
                    max.chr=10, cores=1, max_r2=0.03)

## End(Not run)