Help for package AGHmatrix

Title:

Relationship Matrices for Diploid and Autopolyploid Species

Version:

2.1.4

Description:

Computation of A (pedigree), G (genomic-base), and H (A corrected by G) relationship matrices for diploid and autopolyploid species. Several methods are implemented considering additive and non-additive models.

Author:

Rodrigo Amadeu [aut, cre], Luis Ferrao [aut, ctb], Catherine Cellon [ctb], Leticia Lara [ctb], Marcio Resende [ctb], Ivone Oliveira [ctb], Patricio Munoz [ctb], Augusto Garcia [ctb]

Depends:

R (≥ 3.6.0)

Imports:

Matrix (≥ 1.2-7.1), zoo (≥ 1.8.6)

Suggests:

knitr, MASS, rmarkdown

VignetteBuilder:

knitr

Encoding:

UTF-8

License:

GPL-3

URL:

https://github.com/rramadeu/AGHmatrix

Maintainer:

Rodrigo Amadeu <rramadeu@gmail.com>

RoxygenNote:

7.2.3

NeedsCompilation:

Packaged:

2023-10-03 15:25:06 UTC; GNIEZ

Repository:

CRAN

Date/Publication:

2023-10-03 18:20:02 UTC

Construction of Relationship Matrix A

Description

Creates an additive relationship matrix A from a pedigree data in a 3-column way format based on ploidy level (an even number) and, if ploidy equals 4, based on proportion of parental gametes that are IBD (Identical by Descent) due to double reduction. Returns a dominance relationship matrix if dominance true (ploidy 2 only). Autopolyploid matrices based on Kerr (2012), used when 'ploidy' argument is higher than '2' and 'dominance=FALSE'. Diploid additive numerator relationship matrix built as in Henderson (1976), used when 'ploidy=2' and 'dominance=FALSE'. Diploid dominance numerator relationship matrix built as in Cockerham (1954), used when 'ploidy=2' and 'dominance=FALSE'. For details of recursive method see Mrode (2005).

Usage

Amatrix(
  data = NULL,
  ploidy = 2,
  w = 0,
  verify = TRUE,
  dominance = FALSE,
  slater = FALSE,
  ASV = FALSE,
  ...
)

Arguments

data

pedigree data name (3-column way format). Unknown value should be equal 0.

ploidy

an even number (default=2).

w

proportion of parental gametas IBD due to double reduction (default=0), only if ploidy=4.

verify

verifies pedigree file for conflictuos entries (default=TRUE).

dominance

if true, returns the dominance relationship matrix

slater

if true, returns the additive autotetraploid relationship matrix as Slater (2013)

ASV

if TRUE, transform matrix into average semivariance (ASV) equivalent (K = K / (trace(K) / (nrow(K)-1))). Details formula 2 of Fieldmann et al. (2022). Default = FALSE.

...

arguments to be passed to datatreat()

Value

Matrix with the Relationship between the individuals.

Author(s)

Rodrigo R Amadeu, rramadeu@gmail.com

References

Cockerham, CC. 1954. An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics 39, 859–882

Feldmann MJ, et al. 2022. Average semivariance directly yields accurate estimates of the genomic variance in complex trait analyses. G3 (Bethesda), 12(6).

Henderson, CR. 1976. A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics 32, 69-83

Kerr, RJ, et al. 2012. Use of the numerator relationship matrix in genetic analysis of autopolyploid species. Theoretical and Applied Genetics 124 1271-1282

Mrode, RA. 2014. Chapter 2: Genetic Covariance Between Relatives and Chapter 9: Non-additive Animal Models in Mrode, RA. 2014. Linear models for the prediction of animal breeding values. Cabi, 3rd edition.

Slater, AT, et al. 2013. Improving the analysis of low heritability complex traits for enhanced genetic gain in potato. Theoretical and Applied Genetics 127, 809-820

Examples

data(ped.mrode)
#Computing additive relationship matrix considering diploidy (Henderson 1976):
Amatrix(ped.mrode, ploidy=2)
#Computing non-additive relationship matrix considering diploidy (Cockerham 1954):
Amatrix(ped.mrode, ploidy=2, dominance=TRUE)
#Computing additive relationship matrix considering autotetraploidy (Kerr 2012):
Amatrix(ped.mrode, ploidy=4)
#Computing additive relationship matrix considering autooctaploidy (Kerr 2012):
Amatrix(ped.mrode, ploidy=8)
#Computing additive relationship matrix considering autotetraploidy and double-
#reduction of 0.1 (Kerr 2012):
Amatrix(ped.mrode, ploidy=4, w=0.1)
#Computing additive relationship matrix considering 
#autotetraploidy and double-reduction of 0.1 (Slater 2014):
Amatrix(ped.mrode, ploidy=4, w=0.1, slater = TRUE)
#Computing additive relationship matrix considering autohexaploidy and double-
#reduction of 0.1 (Kerr 2012):
Amatrix(ped.mrode, ploidy=6, w=0.1)

Construction of pedigree-based relationship matrix with parental guessing possibility

Description

Creates an additive relationship matrix A based on a non-deterministic pedigree with 4+ columns where each column represents a possible parent. This function was built with the following designs in mind. 1) A mating design where you have equally possible parents. For example, a generation of insects derived from the mating of three insects in a cage. All the insects in this generation will have the same expected relatedness with all the possible parents (1/3). If there are only two parents in the cage, the function assumes no-inbreeding and the pedigree is deterministic (the individual is offspring of the cross between the two parents). Another example, a population of 10 open-pollinated plants where you harvest the seeds without tracking the mother. 2) When fixedParent is TRUE: a mating design where you know one parent and might know the other possible parents. For example, a polycross design where you have seeds harvested from a mother plant and possible polen donors.

Usage

AmatrixPolyCross(data = NULL, fixedParent = FALSE)

Arguments

data

pedigree data name. Unknown value should be equal 0. See example for construction.

fixedParent

if false, assumes that all the parents are equally possible parents. If true, assumes that the first parental is known and the others are equally possible parents. Default = FALSE.

Value

Matrix with the relationship between the individuals.

Author(s)

Rodrigo R Amadeu, rramadeu@gmail.com

Examples

#the following pedigree has the id of the individual followed by possible parents
#if 0 is unknown
#the possible parents are filled from left to right
#in the pedigree data frame examples:
#id 1,2,3,4 have unknown parents and are assumed unrelated
#id 5 has three possible parents (1,2,3)
#id 6 has three possible parents (2,3,4)
#id 7 has two parents (deterministic case here, the parents are 3 and 4)
#id 8 has four possible parents (5,6,7,1)

pedigree = data.frame(id=1:8,
                      parent1 = c(0,0,0,0,1,2,3,5),
                      parent2 = c(0,0,0,0,2,3,4,6),
                      parent3 = c(0,0,0,0,3,4,0,7),
                      parent4 = c(0,0,0,0,0,0,0,1),
                      parent5 = 0)

print(pedigree)

AmatrixPolyCross(pedigree)

#when polyCross is set to be true:
#id 5 is offspring of parent 1 in a deterministic way and two other possible parents (2,3)
#id 6 is offspring of parent 2 in a deterministic way and two other possible parents (3,4)
#id 7 has two parents (deterministic case here, the parents are 3 and 4); as before
#id 8 is offspring of parent 5 in a deterministic way and has three other possible parents (6,7,1)

AmatrixPolyCross(pedigree,fixedParent=TRUE)

Construction of Relationship Matrix G

Description

Given a matrix (individual x markers), a method, a missing value, and a maf threshold, return a additive or non-additive relationship matrix. For diploids, the methods "Yang" and "VanRaden" for additive relationship matrices, and "Su" and "Vitezica" for non-additive relationship matrices are implemented. For autopolyploids, the method "VanRaden" for additive relationship, method "Slater" for full-autopolyploid model including non-additive effects, and pseudo-diploid parametrization are implemented. Weights are implemented for "VanRaden" method as described in Liu (2020).

Usage

Gmatrix(
  SNPmatrix = NULL,
  method = "VanRaden",
  missingValue = -9,
  maf = 0,
  thresh.missing = 0.5,
  verify.posdef = FALSE,
  ploidy = 2,
  pseudo.diploid = FALSE,
  integer = TRUE,
  ratio = FALSE,
  impute.method = "mean",
  rmv.mono = FALSE,
  thresh.htzy = 0,
  ratio.check = TRUE,
  weights = NULL,
  ploidy.correction = FALSE,
  ASV = FALSE
)

Arguments

SNPmatrix

matrix (n x m), where n is is individual names and m is marker names (coded inside the matrix as 0, 1, 2, ..., ploidy, and, missingValue).

method

"Yang" or "VanRaden" for marker-based additive relationship matrix. "Su" or "Vitezica" for marker-based dominance relationship matrix. "Slater" for full-autopolyploid model including non-additive effects. "Endelman" for autotetraploid dominant (digentic) relationship matrix. "MarkersMatrix" for a matrix with the amount of shared markers between individuals (3). Default is "VanRaden", for autopolyploids will be computed a scaled product (similar to Covarrubias-Pazaran, 2006).

missingValue

missing value in data. Default=-9.

maf

minimum allele frequency accepted to each marker. Default=0.

thresh.missing

threshold on missing data, SNPs below of this frequency value will be maintained, if equal to 1, no threshold and imputation is considered. Default = 0.50.

verify.posdef

verify if the resulting matrix is positive-definite. Default=FALSE.

ploidy

data ploidy (an even number between 2 and 20). Default=2.

pseudo.diploid

if TRUE, uses pseudodiploid parametrization of Slater (2016).

integer

if FALSE, not check for integer numbers. Default=TRUE.

ratio

if TRUE, molecular data are considered ratios and its computed the scaled product of the matrix (as in "VanRaden" method).

impute.method

"mean" to impute the missing data by the mean per marker, "mode" to impute the missing data by the mode per marker, "global.mean" to impute the missing data by the mean across all markers, "global.mode" to impute the missing data my the mode across all marker. Default = "mean".

rmv.mono

if monomorphic markers should be removed. Default=FALSE.

thresh.htzy

threshold heterozigosity, remove SNPs below this threshold. Default=0.

ratio.check

if TRUE, run Mcheck with ratio data.

weights

vector with weights for each marker. Only works if method="VanRaden". Default is a vector of 1's (equal weight).

ploidy.correction

It sets the denominator (correction) of the crossprod. Used only when ploidy > 2 for "VanRaden" and ratio models. If TRUE, it uses the sum of "Ploidy" times "Frequency" times "(1-Frequency)" of each marker as method 1 in VanRaden 2008 and Endelman (2018). When ratio=TRUE, it uses "1/Ploidy" times "Frequency" times "(1-Frequency)". If FALSE, it uses the sum of the sampling variance of each marker. Default = FALSE.

ASV

if TRUE, transform matrix into average semivariance (ASV) equivalent (K = K / (trace(K) / (nrow(K)-1))). Details formula 2 of Fieldmann et al. (2022). Default = FALSE.

Value

Matrix with the marker-bases relationships between the individuals

Author(s)

Rodrigo R Amadeu rramadeu@gmail.com, Marcio Resende Jr, Letícia AC Lara, Ivone Oliveira, and Felipe V Ferrao

References

Covarrubias-Pazaran, G. 2016. Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6):1-15.

Endelman, JB, et al., 2018. Genetic variance partitioning and genome-wide prediction with allele dosage information in autotetraploid potato. Genetics, 209(1) pp. 77-87.

Feldmann MJ, et al. 2022. Average semivariance directly yields accurate estimates of the genomic variance in complex trait analyses. G3 (Bethesda), 12(6).

Liu, A, et al. 2020. Weighted single-step genomic best linear unbiased prediction integrating variants selected from sequencing data by association and bioinformatics analyses. Genet Sel Evol 52, 48.

Slater, AT, et al. 2016. Improving genetic gain with genomic selection in autotetraploid potato. The Plant Genome 9(3), pp.1-15.

Su, G, et al. 2012. Estimating additive and non-additive genetic variances and predicting genetic merits using genome-wide dense single nucleotide polymorphism markers. PloS one, 7(9), p.e45293.

VanRaden, PM, 2008. Efficient methods to compute genomic predictions. Journal of dairy science, 91(11), pp.4414-4423.

Vitezica, ZG, et al. 2013. On the additive and dominant variance and covariance of individuals within the genomic selection scope. Genetics, 195(4), pp.1223-1230.

Yang, J, et al. 2010. Common SNPs explain a large proportion of the heritability for human height. Nature genetics, 42(7), pp.565-569.

Examples

## Not run: 
## Diploid Example
data(snp.pine)
#Verifying if data is coded as 0,1,2 and missing value.
str(snp.pine)
#Build G matrices
Gmatrix.Yang <- Gmatrix(snp.pine, method="Yang", missingValue=-9, maf=0.05)
Gmatrix.VanRaden <- Gmatrix(snp.pine, method="VanRaden", missingValue=-9, maf=0.05)
Gmatrix.Su <- Gmatrix(snp.pine, method="Su", missingValue=-9, maf=0.05)
Gmatrix.Vitezica <- Gmatrix(snp.pine, method="Vitezica", missingValue=-9, maf=0.05)

## Autetraploid example
data(snp.sol)
#Build G matrices
Gmatrix.VanRaden <- Gmatrix(snp.sol, method="VanRaden", ploidy=4)
Gmatrix.Endelman <- Gmatrix(snp.sol, method="Endelman", ploidy=4) 
Gmatrix.Slater <- Gmatrix(snp.sol, method="Slater", ploidy=4)
Gmatrix.Pseudodiploid <- Gmatrix(snp.sol, method="VanRaden", ploidy=4, pseudo.diploid=TRUE) 

#Build G matrix with weights
Gmatrix.weighted <- Gmatrix(snp.sol, method="VanRaden", weights = runif(3895,0.001,0.1), ploidy=4)

## End(Not run)

Construction of Combined Relationship Matrix H

Description

Given a matrix A and a matrix G returns a H matrix. H matrix is the relationship matrix using combined information from the pedigree and genomic relationship matrices. First, you need to compute the matrices separated and then use them as input to build the combined H matrix. Two methods are implemented: 'Munoz' shrinks the G matrix towards the A matrix scaling the molecular relatadness by each relationship classes; 'Martini' is a modified version from Legarra et al. (2009) where combines A and G matrix using scaling factors. When method is equal 'Martini' and 'tau=1' and 'omega=1' you have the same H matrix as in Legarra et al. (2009).

Usage

Hmatrix(
  A = NULL,
  G = NULL,
  markers = NULL,
  c = 0,
  method = "Martini",
  tau = 1,
  omega = 1,
  missingValue = -9,
  maf = 0,
  ploidy = 2,
  roundVar = 3,
  ASV = FALSE
)

Arguments

A

A matrix from function Amatrix

G

G matrix from function Gmatrix

markers

matrix marker which generated the Gmatrix

c

constant value of H computation, default: c=0

method

"Martini" or "Munoz", default="Martini"

tau

to be used for Martini's method, default=1.

omega

to be used of Martini's method, default=1.

missingValue

missing value in data, default=-9.

maf

max of missing data accepted to each markerm default=0.05.

ploidy

data ploidy (an even number between 2 and 20), default=2.

roundVar

only used for Munoz's method, how many digits to consider the relationship be of same class, default=2.

ASV

if TRUE, transform matrix into average semivariance (ASV) equivalent (K = K / (trace(K) / (nrow(K)-1))). Details formula 2 of Fieldmann et al. (2022). Default = FALSE.

Value

H Matrix with the relationship between the individuals based on pedigree and corrected by molecular information

Author(s)

Rodrigo R Amadeu, rramadeu@gmail.com

References

Feldmann MJ, et al. 2022. Average semivariance directly yields accurate estimates of the genomic variance in complex trait analyses. G3 (Bethesda), 12(6).

Munoz, PR. 2014 Unraveling additive from nonadditive effects using genomic relationship matrices. Genetics 198, 1759-1768

Martini, JW, et al. 2018 The effect of the H-1 scaling factors tau and omega on the structure of H in the single-step procedure. Genetics Selection Evolution 50(1), 16

Legarra, A, et al. 2009 A relationship matrix including full pedigree and genomic information. Journal of Dairy Science 92, 4656–4663

Examples

## Not run: 
data(ped.sol)
data(snp.sol)
#Computing the numerator relationship matrix 10% of double-reduction
Amat <- Amatrix(ped.sol, ploidy=4, w = 0.1)
#Computing the additive relationship matrix based on VanRaden (modified)
Gmat <- Gmatrix(snp.sol, ploidy=4, 
                maf=0.05, method="VanRaden")
Gmat <- round(Gmat,3) #to be easy to invert

#Computing H matrix (Martini)
Hmat_Martini <- Hmatrix(A=Amat, G=Gmat, method="Martini", 
                     ploidy=4, 
                     maf=0.05)
                     
#Computing H matrix (Munoz)
Hmat_Munoz <- Hmatrix(A=Amat, G=Gmat, markers = snp.sol, 
                      ploidy=4, method="Munoz",
                      roundVar=2,
                      maf=0.05)

## End(Not run)

Check and filter markers

Description

This function does different filtering on the marker matrix

Usage

Mcheck(
  SNPmatrix = NULL,
  ploidy = 2,
  missingValue = -9,
  thresh.maf = 0.05,
  thresh.missing = 0.9,
  thresh.htzy = 0,
  impute.method = "mean",
  rmv.mono = TRUE
)

Arguments

SNPmatrix

matrix (n x m), where n is is individual names and m is marker names (coded inside the matrix as 0, 1, 2, ..., ploidy, and, missingValue).

ploidy

data ploidy (an even number between 2 and 20). Default=2.

missingValue

missing value in data. Default=-9.

thresh.maf

minimum allele frequency accepted to each marker. Default=0.05.

thresh.missing

threshold on missing data, SNPs below of this frequency value will be maintained, if equal to 1, no threshold and imputation is considered. Default = 0.50.

thresh.htzy

threshold heterozigosity, remove SNPs below this threshold. Default=0.

impute.method

rmv.mono

if monomorphic markers should be removed. Default=TRUE.

Value

SNPmatrix after filtering steps.

Author(s)

Luis F V Ferrao and Rodrigo Amadeu, rramadeu@gmail.com

Examples

data(snp.pine)
M = Mcheck(snp.pine)

Organizes pedigree data in a chronological way

Description

This function organizes pedigree data in a chronological way and return 3 lists: i) parental 1 values (numeric); ii) parental 2 values (numeric); iii) real names of the individuals. Also save a .txt file with new pedigree file.

Usage

datatreat(data = NULL, n.max = 50, unk = 0, save = FALSE)

Arguments

data

name of the pedigree data frame. Default=NULL.

n.max

max number of iteractions to get the chronological order. Default = 50

unk

the code of the data missing. Default=0.

save

if TRUE, save the genealogy in a .txt file

Value

list with parental 1, parental 2, and real names of the individuals (key) also saves a txt file with the new chronological pedigree.

Author(s)

Rodrigo R Amadeu, rramadeu@gmail.com

Examples

data(ped.mrode)
datatreat(ped.mrode)

Add new crosses to a current A matrix

Description

Expand a current A matrix with a new pedigree. The parents in the new pedigree should also be in the A matrix.

Usage

expandAmatrix(newPedigree = NULL, A = NULL, returnAll = TRUE)

Arguments

newPedigree

pedigree data name (3-column way format). Unknown value should be equal 0.

A

numerator relationship matrix output from Amatrix function.

returnAll

if TRUE returns old A with new A, if FALSE returns only new A

Value

Matrix with the Relationship between the individuals.

Author(s)

Rodrigo R Amadeu, rramadeu@gmail.com

Examples

data(ped.sol)
ped.initial = ped.sol[1:1120,]
ped.new = ped.sol[-c(1:1120),]
#Computing additive relationship matrix:
A = Amatrix(ped.initial, ploidy=2)
Anew = expandAmatrix(ped.new, A)

#Comparing with one-step building..
Afull = Amatrix(ped.sol, ploidy=2)
test = Anew-Afull
which(test!=0)

Filter the pedigree to keep only the genealogy of a subset of individuals

Description

Filter the pedigree to keep only the genealogy of a subset of individuals

Usage

filterpedigree(inds, data)

Arguments

inds

vector with strings of individuals to keep their genealogy in the matrix

data

name of the pedigree data frame. Default=NULL.

Value

a data frame with pedigree containing the genealogy of the selected individuals

Author(s)

Rodrigo R Amadeu, rramadeu@gmail.com

Examples

data(ped.sol)
new.ped.sol = filterpedigree(inds = c("MSW168-2","W14090-3","W14090-4"),data=ped.sol)

Transform a matrix in 3 columns

Description

Given any square matrix transform it in a 3 columns way (row, column, value) mainly to be used in outsourcing data processing (as ASREML-standalone)

Usage

formatmatrix(
  data = NULL,
  save = TRUE,
  return = FALSE,
  name = deparse(substitute(data)),
  round.by = 12,
  exclude.0 = TRUE
)

Arguments

data

matrix (nxn).

save

if TRUE save the output in a file. Default=TRUE.

return

if TRUE return the output in a object. Default=FALSE.

name

name of the csv file to be saved. Default=data name.

round.by

select the number of digits after 0 you want in your data. Default = 12

exclude.0

if TRUE, remove all lines equal to zero (ASREML option). Default = TRUE

Value

a object or a csv file with a table with 3 columns representing the matrix.

Author(s)

Rodrigo R Amadeu, rramadeu@gmail.com

Examples

#Example with random matrix
data<-matrix(c(1,0.1,0,0.1,1,0,0,0,1.1),3)
formatmatrix(data=data,save=FALSE,return=TRUE,exclude.0=TRUE)

#Example with pedigree matrix
#Reading the example data
data(ped.mrode)
#Making Relationship Matrix
Amrode<-Amatrix(ped.mrode)
#Inverting the Matrix
Amrode.inv<-solve(Amrode)
#Making the 3 columns format
Amrode.inv.ASREML<-formatmatrix(Amrode,save=FALSE,return=TRUE,exclude.0=TRUE)
#Printing it
Amrode.inv.ASREML

Survying on missing data

Description

This function verify which rows in a pedigree data has missing parental or conflictuos data

Usage

missingdata(data, unk = 0)

Arguments

data

data name from a pedigree list

unk

unknown value of your data

Value

list with $conflict: rows of the data which are at least one parental name equal to the individual. $missing.sire: rows of the data which arie missing data sire (Parental 1) information. $missing.dire: same as above for dire (Parental 2). $summary.missing: summary of the missing data. 2 columns, 1st for the name of the parental listed, 2nd for the how many times appeared in the data.

Author(s)

Rodrigo R Amadeu, rramadeu@gmail.com

Examples

data(ped.mrode)
missingdata(ped.mrode)

Pedigree Data

Description

Data from pedigree example proposed by Mrode 2005

Usage

data(ped.mrode)

Format

table

References

R. A. Mrode, R. Thompson. Linear Models for the Prediction of Animal Breeding Values. CABI, 2005.

Examples

data(ped.mrode)

Pedigree data for autopolyploid examples

Description

Dataset extract from supplementary material from Endelman et al. (2018). Pedigree data frame of Potato population, missing data as 0.

Usage

data(ped.sol)

Format

data.frame

References

Endelman, JB, et al., 2018 Genetic variance partitioning and genome-wide prediction with allele dosage information in autotetraploid potato. Genetics, 209(1) pp. 77-87.

Examples

data(ped.sol)

Molecular data for diploid examples

Description

Dataset extract from supplementary material from Resende et al. (2012). SNP marker matrix from Pine tree coded as 0,1, and 2, and missing value as -9.

Usage

data(snp.pine)

Format

matrix

References

Resende, MF, et al., 2012 Accuracy of genomic selection methods in a standard data set of loblolly pine (Pinus taeda l.). Genetics 190: 1503–1510.

Examples

data(snp.pine)

Molecular data for autopolyploid examples

Description

Dataset extract from supplementary material from Endelman et al. (2018). SNP marker matrix from Pine tree coded as 0,1,2,3,4 and missing value as -9.

Usage

data(snp.sol)

Format

data.frame

References

Endelman, JB, et al., 2018 Genetic variance partitioning and genome-wide prediction with allele dosage information in autotetraploid potato. Genetics, 209(1) pp. 77-87.

Examples

data(snp.sol)