Type: | Package |
Title: | Latent Class Analysis (LCA) with Familial Dependence in Extended Pedigrees |
Version: | 1.3 |
Date: | 2018-07-05 |
Author: | Arafat TAYEB <arafat.tayeb@ircm.qc.ca>, Alexandre BUREAU <alexandre.bureau@msp.ulaval.ca> and Aurelie Labbe <aurelie.labbe@mcgill.ca> |
Maintainer: | Alexandre BUREAU <alexandre.bureau@msp.ulaval.ca> |
Depends: | R (≥ 2.1.0) |
Imports: | boot, mvtnorm, rms, kinship2 |
Description: | Latent Class Analysis of phenotypic measurements in pedigrees and model selection based on one of two methods: likelihood-based cross-validation and Bayesian Information Criterion. Computation of individual and triplet child-parents weights in a pedigree is performed using an upward-downward algorithm. The model takes into account the familial dependence defined by the pedigree structure by considering that a class of a child depends on his parents classes via triplet-transition probabilities of the classes. The package handles the case where measurements are available on all subjects and the case where measurements are available only on symptomatic (i.e. affected) subjects. Distributions for discrete (or ordinal) and continuous data are currently implemented. The package can deal with missing data. |
License: | GPL-2 | GPL-3 [expanded from: GPL] |
LazyLoad: | yes |
URL: | https://CRAN.R-project.org/package=LCAextend |
Repository: | CRAN |
NeedsCompilation: | no |
Packaged: | 2018-07-06 19:14:03 UTC; alexandrebureau |
Date/Publication: | 2018-07-07 15:40:21 UTC |
computes cumulative logistic coefficients using probabilities
Description
computes cumulative logistic coefficients using probabilities.
Usage
alpha.compute(p)
Arguments
p |
a vector of probabilities (positive entries summing to 1). |
Details
If p
has one value (equal to 1) alpha.compute
returns NA
, if it has S (S>=2)
values, alpha.compute
returns S-1
coefficients
alpha
such that if Y
is a random variable taking values in {1,...,S}
with probabilities p
, coefficients alpha[i]
are given by:
p_1+...+p_i=P(Y\leq i)=\frac{\exp(\alpha_1+...+\alpha_i)}{1+\exp(\alpha_1+...+\alpha_i)},
for all i<=S-1
.
Value
The function returns alpha
: a vector of S-1
cumulative logistic coefficients.
See Also
alpha.compute
is the inverse function of p.compute
Examples
# a vector of probability
p <- c(0,0.2,0,0,0.3,0.4,0.1,0,0)
alpha.compute(p)
#gives -Inf -1.38 0 0 1.38 0 2.19 Inf Inf
p.compute(alpha.compute(rep(1/5,5)))
associates to a function of density parameter optimization an attribute to distinguish between ordinal and normal cases
Description
associates to a function of density parameter optimization an attribute to distinguish between ordinal and normal cases. This is an internal function not meant to be called by the user.
Usage
attrib.dens(optim.param)
Arguments
optim.param |
the function used to estimate the parameters of the measurements. |
Details
Available optim.param
functions are optim.noconst.ordi
, optim.const.ordi
for ordinal measurements and optim.indep.norm
, optim.diff.norm
, optim.equal.norm
and optim.gene.norm
for multinormal measurements. The attribution uses the internal function attr
and the attribute name used is type
. The user can make his own optima.param
function and has to associate an attribute type
to it to be used instead of the available ones.
Value
The function returns the same optim.param
with an attribute type
taking values in ordi
or norm
.
Examples
optim.param <- optim.indep.norm
optim.param <- attrib.dens(optim.param)
computes the multinormal density of a given continuous measurement vector for all classes
Description
computes the density of an individual's continuous measurement vector for all latent classes, eventually taking covariates into account. This is an internal function not meant to be called by the user.
Usage
dens.norm(y.x, param, var.list = NULL)
Arguments
y.x |
a vector |
param |
a list of the multinormal density parameters: means |
var.list |
a list of integers indicating which covariates (taken from |
Details
For each class k
, the function computes the multinormal density with means param$mu[[k]]
and variances-covariances matrix
param$sigma[[k]]
for the individual's measurement
vector. Treatment of covariates is not yet implemented, and any
provided covariate value will be ignored.
Value
The function returns a vector dens
of length K
, where
dens[k]
is the density of the measurements if the individual belongs to class k
.
Examples
#data
data(ped.cont)
status <- ped.cont[,6]
y <- ped.cont[status==2,7:ncol(ped.cont)]
#param
data(param.cont)
#the function applied for measurement of the first individual in the ped.ordi
dens.norm(y.x=y[1,],param.cont)
computes the probability of a given discrete measurement vector for all classes under a product of multinomial
Description
computes the probability of an individual's discrete measurement vector for all latent classes under a multinomial distribution product, eventually taking covariates into account. This is an internal function not meant to be called by the user.
Usage
dens.prod.ordi(y.x, param, var.list = NULL)
Arguments
y.x |
a vector |
param |
a list of the parameters alpha (cumulative logistic coefficients), see |
var.list |
a list of integers indicating which covariates (taken from |
Details
If there are K
latent classes, d
measurements and each measurement has S[j]
possible values, alpha
is a list of d
elements, each is a K
times S[j]+length{var.list[[j]]}
matrix. For a class C=k
, dens[k]=
\displaystyle\prod_{j=1}^d{P(Y_j=y_j|C=k,X_j=x_j)}
, where P(Y_j=y_j|C=k,X_j=x_j)
is
computed from the cumulative logistic coefficients alpha[[j]][k,]
and
covariates x[var.list[[j]]]
,
Value
The function returns a vector dens
of length K
, where
dens[k]
is the probability of measurement vector y
with covariates x
,
if the individual belongs to class k
.
See Also
See Also init.ordi
,
Examples
#data
data(ped.ordi)
status <- ped.ordi[,6]
y <- ped.ordi[status==2,7:ncol(ped.ordi)]
#param
data(param.ordi)
#the function applied for measurement of the first individual in the ped.ordi
dens.prod.ordi(y.x=y[1,],param.ordi)
performs the downward step of the peeling algorithm and computes unnormalized triplet and individual weights
Description
computes the probability of measurements above connectors and their classes given the model parameters, and returns the unnormalized triplet and individual weights. This is an internal function not meant to be called by the user.
Usage
downward(id, dad, mom, status, probs, fyc, peel, res.upward)
Arguments
id |
individual ID of the pedigree, |
dad |
dad ID, |
mom |
mom ID, |
status |
symptom status: (2: symptomatic, 1: without symptoms, 0: missing), |
probs |
a list of probability parameters of the model, |
fyc |
a matrix of |
peel |
a list of pedigree peeling containing connectors by peeling order and couples of parents, |
res.upward |
result of the upward step of the peeling algorithm, see |
Details
This function computes the probability of observations above connectors and their classes using the function downward.connect
, for each connector,
if Y_above(i)
is the observations above connector i
and S_i
and C_i
are his status and his class respectively, the functions computes
P(Y_above(i),S_i,C_i)
by computing a downward step for the parent of connector i
who is also a connector. These quantities are used by the function weight.nuc
to compute the unnormalized triplet weights ww
and the unnormalized
individual weights w
.
Value
The function returns a list of 2 elements:
ww |
unnormalized triplet weights, an array of |
w |
unnormalized individual weights, an array of |
References
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
See Also
See also downward.connect
.
Examples
#data
data(ped.cont)
data(peel)
fam <- ped.cont[,1]
id <- ped.cont[fam==1,2]
dad <- ped.cont[fam==1,3]
mom <- ped.cont[fam==1,4]
status <- ped.cont[fam==1,6]
y <- ped.cont[fam==1,7:ncol(ped.cont)]
peel <- peel[[1]]
#standardize id to be 1, 2, 3, ...
id.origin <- id
standard <- function(vec) ifelse(vec%in%id.origin,which(id.origin==vec),0)
id <- apply(t(id),2,standard)
dad <- apply(t(dad),2,standard)
mom <- apply(t(mom),2,standard)
peel$couple <- cbind(apply(t(peel$couple[,1]),2,standard),
apply(t(peel$couple[,2]),2,standard))
for(generat in 1:peel$generation)
peel$peel.connect[generat,] <- apply(t(peel$peel.connect[generat,]),2,standard)
#probs and param
data(probs)
data(param.cont)
#densities of the observations
fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1)
fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm,param.cont,NULL))
#the upward step
res.upward <- upward(id,dad,mom,status,probs,fyc,peel)
#the function
downward(id,dad,mom,status,probs,fyc,peel,res.upward)
performs a downward step for a connector
Description
computes the probability of the measurements above a connector and the connector latent class given the model parameters. This is an internal function not meant to be called by the user.
Usage
downward.connect(connect, parent1, parent2, bro.connect, status,
probs, fyc, p.ybarF.c, res.upward)
Arguments
connect |
a connector in the pedigree (individual with parents and children in the pedigree), |
parent1 |
one of the connector parent who is also a connector, |
parent2 |
the other parent of the connector (not a connector), |
bro.connect |
siblings of the connector, |
status |
a vector of symptom status, |
probs |
a list of all probability parameters of the model, |
fyc |
a matrix of |
p.ybarF.c |
a array of dimension |
res.upward |
the result of the upward step of the peeling algorithm, see |
Details
If Y_above(i)
is the measurements above connector i
and S_i
and C_i
are his status and his class respectively, the function computes
P(Y_above(i),S_i,C_i)
by computing a downward step for the parent of connector i
who is also a connector.
Value
The function returns p.ybarF.c
updated for connector i
.
References
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
See Also
See also downward
Examples
#data
data(ped.cont)
data(peel)
fam <- ped.cont[,1]
id <- ped.cont[fam==1,2]
dad <- ped.cont[fam==1,3]
mom <- ped.cont[fam==1,4]
status <- ped.cont[fam==1,6]
y <- ped.cont[fam==1,7:ncol(ped.cont)]
peel <- peel[[1]]
#standardize id to be 1, 2, 3, ...
id.origin <- id
standard <- function(vec) ifelse(vec%in%id.origin,which(id.origin==vec),0)
id <- apply(t(id),2,standard)
dad <- apply(t(dad),2,standard)
mom <- apply(t(mom),2,standard)
peel$couple <- cbind(apply(t(peel$couple[,1]),2,standard),
apply(t(peel$couple[,2]),2,standard))
for(generat in 1:peel$generation)
peel$peel.connect[generat,] <- apply(t(peel$peel.connect[generat,]),2,standard)
#the 2nd connector
generat <- peel$generation-1
connect <- peel$peel.connect[generat,]
connect <- connect[connect>0][1]
parent1.connect <- intersect(peel$peel.connect[generat+1,],c(dad[id==connect],
mom[id==connect]))
parent2.connect <- setdiff(c(dad[id==connect],mom[id==connect]),parent1.connect)
bro.connect <- union(id[dad==parent1.connect],id[mom==parent1.connect])
bro.connect <- setdiff(bro.connect,connect)
#probs and param
data(probs)
data(param.cont)
#densities of the observations
fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1)
fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm,param.cont,NULL))
#probability of the observations below
p.ybarF.c <- array(1,dim=c(length(id),2,length(probs$p)+1))
#the upward step
res.upward <- upward(id,dad,mom,status,probs,fyc,peel)
#the function
downward.connect(connect,parent1.connect,parent2.connect,bro.connect,status,
probs,fyc,p.ybarF.c,res.upward)
performs the E step of the EM algorithm for a single pedigree for both cases with and without familial dependence
Description
computes triplet and individual weights the E step of the EM algorithm for all pedigrees in the data, in both cases with and without familial dependence. This is an internal function not meant to be called by the user.
Usage
e.step(ped, probs, param, dens, peel, x = NULL, var.list = NULL,
famdep = TRUE)
Arguments
ped |
a matrix representing pedigrees and measurements: |
probs |
a list of probability parameters of the model, see below for more details, |
param |
a list of measurement distribution parameters of the model, see below for more details, |
dens |
distribution of the mesurements, used in the model (multinormal, multinomial,...) |
peel |
a list of pedigree peeling containing connectors by peeling order and couples of parents, |
x |
covariates, if any. Default is |
var.list |
a list of integers indicating which covariates (taken from |
famdep |
a logical variable indicating if familial dependence model is used or not. Default is |
Details
probs
is a list of initial probability parameters:
For models with familial dependence:
p
a probability vector, each
p[c]
is the probability that an symptomatic founder is in classc
forc>=1
,p0
the probability that a founder without symptoms is in class 0,
p.trans
an array of dimension
K
timesK+1
timesK+1
, whereK
is the number of latent classes of the model, and is such thatp.trans[c_i,c_1,c_2]
is the conditional probability that a symptomatic individuali
is in classc_i
given that his parents are in classesc_1
andc_2
,p0connect
a vector of length
K
, wherep0connect[c]
is the probability that a connector without symptoms is in class0
, given that one of his parents is in classc>=1
and the other in class 0,p.found
the probability that a founder is symptomatic,
p.child
the probability that a child is symptomatic,
For models without familial dependence, all individuals are independent:
p
a probability vector, each
p[c]
is the probability that an symptomatic individual is in classc
forc>=1
,p0
the probability that an individual without symptoms is in class 0,
p.aff
the probability that an individual is symptomatic,
param
is a list of measurement density parameters: the coefficients alpha
(cumulative logistic coefficients see alpha.compute
) in
the case of discrete or ordinal data, and means mu
and variances-covariances matrices sigma
in the case of continuous data,
Value
The function returns a list of 3 elements:
ww |
triplet posterior probabilities, an array of |
w |
individual posterior probabilities, an array of |
ll |
log-likelihood of the considered model and parameters. |
References
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
See Also
See also weight.famdep
, lca.model
.
Examples
#data
data(ped.cont)
data(peel)
#probs and probs
data(probs)
data(param.cont)
#the function
e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL,var.list=NULL,
famdep=TRUE)
computes initial values for the EM algorithm in the case of continuous measurements
Description
computes initial values of means and variance-covariance matrices for the EM algorithm in the case of continuous measurements and multinormal model.
Usage
init.norm(y, K, x = NULL, var.list = NULL)
Arguments
y |
a |
K |
number of latent classes of the model, |
x |
a matrix of covariates if any, default is |
var.list |
a list of integers indicating which covariates (taken from |
Details
The function allocates every individual to a class by a simple clustering of the data and evaluates the means and variance-covariance matrices of measurements in each class. Treatment of covariates is not yet implemented, and any provided covariate value will be ignored.
Value
The function returns a list of 2 elements mu
and sigma
of length K
each, mu[k]
is the means vector
(of length d
) of measurements in class k
and sigma[k]
is the variances-covariances matrix
(of dimension d
times d
) of measurements in class k
.
Examples
#data
data(ped.cont)
status <- ped.cont[,6]
y <- ped.cont[status==2,7:ncol(ped.cont)]
#the function
init.norm(y,K=3)
computes the initial values for EM algorithm in the case of ordinal measurements
Description
computes the initial values of cumulative logistic coefficients alpha for the EM algorithm in the case of ordinal measurements and a product multinomial model.
Usage
init.ordi(y, K, x = NULL, var.list = NULL)
Arguments
y |
a |
K |
number of latent classes of the model, |
x |
a matrix of covariates if any, default is |
var.list |
list of integers indicating which covariates (taken from |
Details
The function allocates every individual to a class and evaluates the cumulative logistic coefficients for each measurement and each class. Regression coefficients for the covariates are set to 0.
Value
The function returns a list of one element alpha
which is a list of d
elements, each element alpha[[j]]
is a K
times
S-1
matrix, where S
is the number of values of the measurement y[,j]
, a row alpha[[j]][k,]
gives the the cumulative logistic
coefficients of class k
and measurement j
using alpha.compute
.
See Also
Examples
#data
data(ped.ordi)
status <- ped.ordi[,6]
y <- ped.ordi[,7:ncol(ped.ordi)]
#the function
init.ordi(y[status==2,],K=3)
initializes the transition probabilities
Description
initializes the marginal transition probabilities with or without parental constraint.
Usage
init.p.trans(K, trans.const = TRUE)
Arguments
K |
number of latent classes, |
trans.const |
a logical variable indicating if the parental constraint is used. Parental constraint means that the class of a subject can be only one
of his parents classes. Default is |
Details
All non-zero transition probabilities are set to be equal. The parental constraint indicator determines which transition probabilities are non-zero.
Value
the function returns p.trans
an array of dimension K
times K+1
times K+1
: p.trans[c_i,c_1,c_2]
is the probability that
the subject i
is assigned to class c_i
and his parents to classes c_1
and c_2
.
Examples
init.p.trans(3) #parental constraint is TRUE,
init.p.trans(3,trans.const=FALSE) #parental constraint is FALSE.
fits latent class models for phenotypic measurements in pedigrees with or without familial dependence using an Expectation-Maximization (EM) algorithm
Description
This is the main function for fitting latent class models. It performs some checks of the pedigrees (it exits if an individual has only one
parent in the pedigree, if no children is in the pedigree or if there
are not enough individuals for parameters estimation) and of the
initial values (positivity of probabilites and their summation to
one). For models with familial dependence, the child latent class
depends on his parents classes via
triplet-transition probabilities. In the case of models without
familial dependence, it performs the classical Latent
Class Analysis (LCA) where all individuals are supposed independent
and the pedigree structure is meaningless. The EM algorithm stops when
the difference between log-likelihood is smaller then tol
that
is fixed by the user.
Usage
lca.model(ped, probs, param, optim.param, fit = TRUE,
optim.probs.indic = c(TRUE, TRUE, TRUE, TRUE), tol = 0.001,
x = NULL, var.list = NULL, famdep = TRUE, modify.init = NULL)
Arguments
ped |
a matrix or data frame representing pedigrees and measurements: |
probs |
a list of initial probability parameters (see below
for more details). The function |
param |
a list of initial measurement distribution parameters (see below for more details). The function |
optim.param |
a variable indicating how measurement distribution parameter optimization is performed (see below for more details), |
fit |
a logical variable, if |
optim.probs.indic |
a vector of logical values indicating which probability parameters to estimate, |
tol |
a small number governing the stopping rule of the EM algorithm. Default is 0.001, |
x |
a matrix of covariates (optional), default is |
var.list |
a list of integers indicating the columns of
|
famdep |
a logical variable indicating if familial dependence model is used or not. Default is |
modify.init |
a function to modify initial values of the EM algorithm, or |
Details
The symptom status vector (column 6 of ped
) takes value 1 for
subjects that have been
examined and show no symptoms (i.e. completely unaffected
subjects). When applying the LCA to
measurements available on all subjects, the status vector must take the
value of 2 for every individual with measurements.
probs
is a list of initial probability parameters:
For models with familial dependence:
p
a probability vector, each
p[c]
is the probability that an symptomatic founder is in classc
forc>=1
,p0
the probability that a founder without symptoms is in class 0,
p.trans
an array of dimension
K
timesK+1
timesK+1
, whereK
is the number of latent classes of the model, and is such thatp.trans[c_i,c_1,c_2]
is the conditional probability that a symptomatic individuali
is in classc_i
given that his parents are in classesc_1
andc_2
,p0connect
a vector of length
K
, wherep0connect[c]
is the probability that a connector without symptoms is in class0
, given that one of his parents is in classc>=1
and the other in class 0,p.found
the probability that a founder is symptomatic,
p.child
the probability that a child is symptomatic,
For models without familial dependence, all individuals are independent:
p
a probability vector, each
p[c]
is the probability that an symptomatic individual is in classc
forc>=1
,p0
the probability that an individual without symptoms is in class 0,
p.aff
the probability that an individual is symptomatic,
param
is a list of measurement distribution parameters: the coefficients alpha
(cumulative logistic coefficients see alpha.compute
) in
the case of discrete or ordinal data, and means mu
and variances-covariances matrices sigma
in the case of continuous data,
optim.param
is a variable indicating how the measurement distribution parameter estimation of the M step is performed. Two possibilities,
optim.noconst.ordi
and optim.const.ordi
, are now available in the case of discrete or ordinal measurements, and four possibilities
optim.indep.norm
(measurements are independent, diagonal variance-covariance matrix),
optim.diff.norm
(general variance-covariance matrix but equal for all classes),
optim.equal.norm
(variance-covariance matrices are different for each class but equal variance and equal covariance for a class) and
optim.gene.norm
(general variance-covariance matrices for all classes), are now available in the case of continuous measurements,
One of the allowed values of optim.param
must be entered without quotes.
optim.probs.indic
is a vector of logical values of length 4 for
models with familial dependence and 2 for models without familial
dependence.
For models with familial dependence:
optim.probs.indic[1]
indicates whether
p0
will be estimated or not,optim.probs.indic[2]
indicates whether
p0connect
will be estimated or not,optim.probs.indic[3]
indicates whether
p.found
will be estimated or not,optim.probs.indic[4]
indicates whether
p.connect
will be estimated or not.
For models without familial dependence:
optim.probs.indic[1]
indicates whether
p0
will be estimated or not,optim.probs.indic[2]
indicates whether
p.aff
will be estimated or not.
All defaults are TRUE
. If the dataset contains only nuclear families, there is no information to estimate p0connect and p.connect, and these parameters will not be estimated, irrespective of the indicator value.
Value
The function returns a list of 4 elements:
param |
the Maximum Likelihood Estimator (MLE) of the
measurement distribution parameters if |
probs |
the MLE of probability parameters if |
When measurements are available on all subjects, the probability parameters p0
and p0connect
are degenerated to 0 and
p.afound
, p.child
and p.aff
to 1 in the output.
weight |
an array of dimension |
ll |
the maximum log-likelihood value (log-ML) if |
References
TAYEB, A. LABBE, A., BUREAU, A. and MERETTE, C. (2011) Solving Genetic Heterogeneity in Extended
Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 26(3): 539-560. DOI: 10.1007/s00180-010-0224-2,
LABBE, A., BUREAU, A. et MERETTE, C. (2009) Integration of Genetic Familial Dependence Structure in Latent Class Models. The International Journal of Biostatistics, 5(1): Article 6.
Examples
#data
data(ped.ordi)
fam <- ped.ordi[,1]
#probs and param
data(param.ordi)
data(probs)
#the function applied only to two first families of ped.ordi
lca.model(ped.ordi[fam%in%1:2,],probs,param.ordi,optim.noconst.ordi,
fit=TRUE,optim.probs.indic=c(TRUE,TRUE,TRUE,TRUE),tol=0.001,x=NULL,
var.list=NULL,famdep=TRUE,modify.init=NULL)
selects a latent class model for pedigree data
Description
Performs selection of a latent class model for phenotypic measurements
in pedigrees based on one of
two possible methods: likelihood-based cross-validation or Bayesian
Information Criterion (BIC) selection. This is the top-level
function to perform a Latent Class Analysis (LCA), which calls the
model fitting function
lca.model
. Model selection is performed among models within one of two
types: with and without familial dependence. Two families of
distributions are currently implemented: product multinomial for discrete (or
ordinal) data and mutivariate
normal for continuous data.
Usage
model.select(ped, distribution, trans.const = TRUE, optim.param,
optim.probs.indic = c(TRUE, TRUE, TRUE, TRUE),
famdep = TRUE, selec = "bic", H = 5, K.vec = 1:7,
tol = 0.001, x = NULL, var.list = NULL)
Arguments
ped |
a matrix containing variables coding the pedigree
structure and the phenotype measurements: |
distribution |
a character variable taking the value |
trans.const |
a logical variable indicating if the parental constraint is used. Parental constraint means that the class of a subject must be one
of his parents classes. Default is |
optim.param |
a variable indicating how the measurement distribution parameter optimization is performed (see below for more details), |
optim.probs.indic |
a vector of logical values indicating which probability parameters to estimate (see below for more details), |
famdep |
a logical variable indicating if the familial dependence model is used or not. Default is |
selec |
a character variables taking the value |
H |
an integer giving the number of equal parts into which data will be splitted for the likelihood-based cross-validation model selection (see below for more details), |
K.vec |
a vector of integers, the number of latent classes of
candidate models, if |
tol |
a small number governing the stopping rule of the EM algorithm. Default is 0.001, |
x |
a matrix of covariates (optional), default is |
var.list |
a list of integers indicating the columns of
|
Details
In the case of cross-validation based-likelihood method, data is
splitted into H
parts: H-1
parts as a training set and one part as a
test set. For each model, a validation log-likelihood is obtained by
evaluating the log-likelihood of the test set data using the parameter
values estimated in the training set. This is repeated H
times
using a different part as training set each time, and a total
validation log-likelihood is obtained by summation over the H
test sets. The best model is the one having the largest
validation log-likelihood. In the case of BIC selection method, the
BIC is computed for each candidate model. The model with the smallest
BIC is selected.
The symptom status vector (column 6 of ped
) takes value 1 for
subjects that have been
examined and show no symptoms (i.e. completely unaffected
subjects). When applying the LCA to
measurements available on all subjects, the status vector must take the
value of 2 for every individual with measurements. If covariates are used, covariate values must be provided for subjects with symptom status 0 (missing) but not for subjects with symptom status 1 (if covariate values are provided, they will be ignored).
optim.param
is a variable indicating how the measurement
distribution parameter optimization of the M step is performed. Two
possibilities,
optim.noconst.ordi
and optim.const.ordi
, are now available in the case of discrete or ordinal measurements, and four possibilities,
optim.indep.norm
(measurements are independent, diagonal variance-covariance matrix),
optim.diff.norm
(general variance-covariance matrix but equal for all classes),
optim.equal.norm
(variance-covariance matrices are different for each class but equal variance and equal covariance for a class) and
optim.gene.norm
(general variance-covariance matrices for all classes), in the case of continuous measurements.
One of the allowed values of optim.param
must be entered without quotes.
optim.probs.indic
is a vector of logical values of length 4 for
models with familial dependence and 2 for models without familial
dependence indicating which probability parameters to estimate. See the
help page for lca.model
for a definition of the parameters.
For models with familial dependence:
optim.probs.indic[1]
indicates whether
p0
will be estimated or not,optim.probs.indic[2]
indicates whether
p0connect
will be estimated or not,optim.probs.indic[3]
indicates whether
p.found
will be estimated or not,optim.probs.indic[4]
indicates whether
p.connect
will be estimated or not.
For models without familial dependence:
optim.probs.indic[1]
indicates whether
p0
will be estimated or not,optim.probs.indic[2]
indicates whether
p.aff
will be estimated or not.
All defaults are TRUE
.
Value
The function returns a list of 5 elements, the first 3 elements are common for BIC and cross-validation model selection methods and are:
param |
the Maximum Likelihood Estimator (MLE) of the measurement distribution parameters of the selected model, |
probs |
the Maximum Likelihood Estimator (MLE) of the probability parameters of the selected model, |
weight |
an array of dimension |
If the cross-validation selection method is used, the function returns also
ll |
the value of the maximum log-likelihood (log-ML) of the selected model, |
ll.valid |
the total cross-validation log-likelihood of all candidate models, |
and if the Bayesian Information Criterion selection method is used, the function returns also
ll |
the value of maximum log-likelihood (log-ML) of all candidate models, |
bic |
the Bayesian Information Criterion
|
References
TAYEB, A. LABBE, A., BUREAU, A. and MERETTE, C. (2011) Solving Genetic Heterogeneity in Extended
Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 26(3): 539-560. DOI: 10.1007/s00180-010-0224-2,
LABBE, A., BUREAU, A. et MERETTE, C. (2009) Integration of Genetic Familial Dependence Structure in Latent Class Models. The International Journal of Biostatistics, 5(1): Article 6.
See Also
See also lca.model
.
Examples
#data
data(ped.cont)
fam <- ped.cont[,1]
#the function applied for the two first families of ped.cont
model.select(ped.cont[fam%in%1:2,],distribution="normal",trans.const=TRUE,
optim.indep.norm,optim.probs.indic=c(TRUE,TRUE,TRUE,TRUE),
famdep=TRUE,selec="bic",K.vec=1:3,tol=0.001,x=NULL,var.list=NULL)
computes the number of parameters of a model
Description
computes the number of free parameters of a model, depending in the number of classes, the type of parameter optimization and the used of familial dependence, to be used in BIC model selection. This is an internal function not meant to be called by the user.
Usage
n.param(y, K, trans.const = TRUE, optim.param,
optim.probs.indic = c(TRUE, TRUE, TRUE, TRUE), famdep = TRUE)
Arguments
y |
a matrix of measurements, |
K |
an integer, the number of latent classes of a candiate model, |
trans.const |
a logical variable indicating if the parental constraint is used. Parental constraint means that the class of a subject can be only one
of his parents classes. Default is |
optim.param |
a function used for parameter optimization, see |
optim.probs.indic |
a vector of logical values indicating which probability parameters to be updated, see |
famdep |
a logical variable indicating if familial dependence model is used or not. Default is |
Value
The function returns the number of free parameters (of the measurement distribution and the probabilities of the latent classes).
See Also
See also model.select
.
Examples
data(ped.cont)
y <- ped.cont[,7:ncol(ped.cont)]
n.param(y,K=3,trans.const=TRUE,optim.indep.norm,
optim.probs.indic=c(TRUE,TRUE,TRUE,TRUE),famdep=TRUE)
performs the M step for the measurement distribution parameters in multinomial case, with an ordinal constraint on the parameters
Description
Estimates the cumulative logistic coefficients alpha
in the
case of multinomial (or ordinal) data with an ordinal constraint on
the parameters.
Usage
optim.const.ordi(y, status, weight, param, x = NULL, var.list = NULL)
Arguments
y |
a matrix of discrete (or ordinal) measurements (only for symptomatic subjects), |
status |
symptom status of all individuals, |
weight |
a matrix of |
param |
a list of measurement density parameters, here is a list of |
x |
a matrix of covariates (optional). Default id |
var.list |
a list of integers indicating which covariates (taken from |
Details
the constraint on the parameters is that, for a symptom j
, the rows alpha[[j]][k,]
are equal for all classes k
except the first values.
Therefore, maximum likelihood estimators are not explicit and the
function lrm
of the package rms
is used to perform a
numerical optimization.
Value
The function returns a list of estimated parameters param
satisfying the constraint.
Examples
#data
data(ped.ordi)
status <- ped.ordi[,6]
y <- ped.ordi[,7:ncol(ped.ordi)]
data(peel)
#probs and param
data(probs)
data(param.ordi)
#e step
weight <- e.step(ped.ordi,probs,param.ordi,dens.prod.ordi,peel,x=NULL,
var.list=NULL,famdep=TRUE)$w
weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.ordi),
ncol=length(probs$p))
#the function
optim.const.ordi(y[status==2,],status,weight,param.ordi,x=NULL,
var.list=NULL)
performs the M step for measurement density parameters in multinormal case
Description
Estimates the mean
mu
and parameters of the variance-covariance matrix
sigma
of a multinormal distribution for the measurements with
a general variance-covariance matrix identical for all classes.
Usage
optim.diff.norm(y, status, weight, param, x = NULL, var.list = NULL)
Arguments
y |
a matrix of continuous measurements (only for symptomatic subjects), |
status |
symptom status of all individuals, |
weight |
a matrix of |
param |
a list of measurement density parameters, here is a list of |
x |
a matrix of covariates (optional). Default id |
var.list |
a list of integers indicating which covariates (taken from |
Details
The values of explicit estimators are computed for both mu
and
sigma
. The variance-covariance matrices sigma
are
identical for all classes. Treatment of covariates is not yet implemented, and any
provided covariate value will be ignored.
Value
The function returns a list of estimated parameters param
.
Examples
#data
data(ped.cont)
status <- ped.cont[,6]
y <- ped.cont[,7:ncol(ped.cont)]
data(peel)
#probs and param
data(probs)
data(param.cont)
#e step
weight <- e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL,
var.list=NULL,famdep=TRUE)$w
weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.cont),
ncol=length(probs$p))
#the function
optim.diff.norm(y[status==2,],status,weight,param.cont,x=NULL,
var.list=NULL)
performs the M step for measurement density parameters in multinormal case
Description
Estimates the mean
mu
and parameters of the variance-covariance matrix
sigma
of a multinormal distribution for the measurements
with equal variance for all measurements and equal covariance between
all pairs of measurements within each class. The variance and
covariance parameters are however distinct for each class.
Usage
optim.equal.norm(y, status, weight, param, x = NULL, var.list = NULL)
Arguments
y |
a matrix of continuous measurements (only for symptomatic subjects), |
status |
symptom status of all individuals, |
weight |
a matrix of |
param |
a list of measurement density parameters, here is a list of |
x |
a matrix of covariates (optional). Default id |
var.list |
a list of integers indicating which covariates (taken from |
Details
The values of explicit estimators are computed for both mu
and
sigma
. The variance-covariance matrices sigma
are
distinct for each class. Treatment of covariates is not yet implemented, and any
provided covariate value will be ignored.
Value
The function returns a list of estimated parameters param
.
Examples
#data
data(ped.cont)
status <- ped.cont[,6]
y <- ped.cont[,7:ncol(ped.cont)]
data(peel)
#probs and param
data(probs)
data(param.cont)
#e step
weight <- e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL,
var.list=NULL,famdep=TRUE)$w
weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.cont),
ncol=length(probs$p))
#the function
optim.equal.norm(y[status==2,],status,weight,param.cont,x=NULL,
var.list=NULL)
performs the M step for measurement density parameters in multinormal case
Description
Estimates the mean
mu
and parameters of the variance-covariance matrix
sigma
of a multinormal distribution for the measurements with
general variance-covariance matrices distinct for each class.
Usage
optim.gene.norm(y, status, weight, param, x = NULL, var.list = NULL)
Arguments
y |
a matrix of continuous measurements (only for symptomatic subjects), |
status |
symptom status of all individuals, |
weight |
a matrix of |
param |
a list of measurement density parameters, here is a list of |
x |
a matrix of covariates (optional). Default id |
var.list |
a list of integers indicating which covariates (taken from |
Details
The values of explicit estimators are computed for both mu
and
sigma
. This is the general case, the variance-covariance
matrices sigma
of the different classes are distinct and
unconstrained. Treatment of covariates is not yet implemented, and any
provided covariate value will be ignored.
Value
The function returns a list of estimated parameters param
.
Examples
#data
data(ped.cont)
status <- ped.cont[,6]
y <- ped.cont[,7:ncol(ped.cont)]
data(peel)
#probs and param
data(probs)
data(param.cont)
#e step
weight <- e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL,
var.list=NULL,famdep=TRUE)$w
weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.cont),
ncol=length(probs$p))
#the function
optim.gene.norm(y[status==2,],status,weight,param.cont,x=NULL,
var.list=NULL)
performs the M step for measurement density parameters in multinormal case
Description
Estimates the mean
mu
and parameters of the variance-covariance matrix
sigma
of a multinormal distribution for the measurements with
diagonal variance-covariance matrices for each class, i.e. measurements are supposed independent.
Usage
optim.indep.norm(y, status, weight, param, x = NULL, var.list = NULL)
Arguments
y |
a matrix of continuous measurements (only for symptomatic subjects), |
status |
symptom status of all individuals, |
weight |
a matrix of |
param |
a list of measurement density parameters, here is a list of |
x |
a matrix of covariates (optional). Default id |
var.list |
a list of integers indicating which covariates (taken from |
Details
The values of explicit estimators are computed for both mu
and
sigma
. All variance-covariance matrices sigma
are
diagonal, i.e. measurements are supposed independent. Treatment of
covariates is not yet implemented, and any
provided covariate value will be ignored.
Value
The function returns a list of estimated parameters param
.
Examples
#data
data(ped.cont)
status <- ped.cont[,6]
y <- ped.cont[,7:ncol(ped.cont)]
data(peel)
#probs and param
data(probs)
data(param.cont)
#e step
weight <- e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL,
var.list=NULL,famdep=TRUE)$w
weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.cont),
ncol=length(probs$p))
#the function
optim.indep.norm(y[status==2,],status,weight,param.cont,x=NULL,
var.list=NULL)
performs the M step for the measurement distribution parameters in multinomial case without constraint on the parameters
Description
Estimates the cumulative logistic coefficients alpha
in the case of multinomial (or ordinal) data without constraint on the coefficients.
Usage
optim.noconst.ordi(y, status, weight, param, x = NULL, var.list = NULL)
Arguments
y |
a matrix of discrete (or ordinal) measurements (only for symptomatic subjects), |
status |
symptom status of all individuals, |
weight |
a matrix of |
param |
a list of measurement distribution parameters, here is a list |
x |
a matrix of covariates (optional). Default is |
var.list |
a list of integers indicating which covariates (taken from |
Details
The values of explicit estimators are computed by logistic transformation of weighted empirical frequencies.
Value
the function returns a list of estimated parameters param
.
Examples
#data
data(ped.ordi)
status <- ped.ordi[,6]
y <- ped.ordi[,7:ncol(ped.ordi)]
data(peel)
#probs and param
data(probs)
data(param.ordi)
#e step
weight <- e.step(ped.ordi,probs,param.ordi,dens.prod.ordi,peel,x=NULL,
var.list=NULL,famdep=TRUE)$w
weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.ordi),
ncol=length(probs$p))
#the function
optim.noconst.ordi(y[status==2,],status,weight,param.ordi,x=NULL,
var.list=NULL)
performs the M step of the EM algorithm for the probability parameters
Description
estimates the probability parameters (p
, p.trans
, p0
,...) in the M step of the EM algorithm in both cases
with and without familial dependence. This is an internal
function not meant to be called by the user.
Usage
optim.probs(ped, probs, optim.probs.indic = c(TRUE, TRUE, TRUE, TRUE),
res.weight, famdep = TRUE)
Arguments
ped |
a matrix of pedigrees data, see |
probs |
all probability parameters to be optimized, |
optim.probs.indic |
a vector of logical values indicating which probability parameters to be updated, |
res.weight |
a matrix of |
famdep |
a logical variable indicating if familial dependence model is used or not. Default is |
Details
explicit estimators are computed in function of the weights.
Value
the function returns the estimated probs
of all probability parameters.
References
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
Examples
#data
data(ped.cont)
data(peel)
#probs and param
data(probs)
data(param.cont)
#e step
weight <- e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL,
var.list=NULL,famdep=TRUE)
#the function
optim.probs(ped.cont,probs,weight,optim.probs.indic=
c(TRUE,TRUE,TRUE,TRUE),famdep=TRUE)
computes the probability vector using logistic coefficients
Description
computes the probability vector using cumulative logistic coefficients
Usage
p.compute(alpha,decal)
Arguments
alpha |
a vector of cumulative logistic coefficients, the first value can
be |
decal |
offset term to be applied to sums of logistic coefficients |
Details
If alpha
has S-1
values, p.compute
returns p
of length S
. If Y
is a random variable taking values in {1,...,S}
with
probabilities p
, coefficients alpha[i]
are given by:
p_1+...+p_i=P(Y\leq i)=\frac{\exp(\alpha_1+...+\alpha_i)}{(1+\exp(\alpha_1+...+\alpha_i)}
for all i<=S-1
.
Value
p
: a probability vector
See Also
p.compute
is the inverse function of alpha.compute
Examples
# a vector of probability
p <- c(0,0.2,0,0,0.3,0.4,0.1,0,0)
alpha <- alpha.compute(p)
#gives alpha= -Inf -1.38 0 0 1.38 0 2.19 Inf Inf
p.compute(alpha) #gives p
computes the posterior probability of observations of a child
Description
computes the posterior probability of measurements of a child for each class and each symptom status of the subject given the classes of both of his parents. This is an internal function not meant to be called by the user.
Usage
p.post.child(child, c.connect, c.spouse, status, probs, fyc)
Arguments
child |
a child in the pedigree, |
c.connect |
the class of one parent (who is a connector) of the child, |
c.spouse |
the class of the other parent of the child, |
status |
the symptom status vector of the whole pedigree, |
probs |
a list of all probability parameters of the model, |
fyc |
a matrix of |
Value
the function returns p.child
a matrix of 2 times K+1
entries such that p.child[s,k]
is the posterior probability of the measurements Y_child
under status S_child=s
and when he is assigned to class k
and his parents are assigned to classes c.connect
and c.spouse
.
References
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
Examples
#data
data(ped.cont)
fam <- ped.cont[,1]
dad <- ped.cont[fam==1,3]
status <- ped.cont[fam==1,6]
y <- ped.cont[fam==1,7:ncol(ped.cont)]
#a child
child <- which(dad!=0)[1]
data(probs)
data(param.cont)
#densities of the observations
fyc <- matrix(1,nrow=nrow(y),ncol=length(probs$p)+1)
fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm,
param.cont,NULL))
#the function
p.post.child(child,c.connect=1,c.spouse=3,status,probs,fyc)
computes the posterior probability of observations of a founder
Description
computes the posterior probability of measurements of a founder for each class and each symptom status of the founder. This is an internal function not meant to be called by the user.
Usage
p.post.found(found, status, probs, fyc)
Arguments
found |
a founder in the pedigree (individual without parents in the pedigree), |
status |
the symptom status vector of the whole pedigree, |
probs |
a list of all probability parameters of the model, |
fyc |
a matrix of |
Value
the function returns p.found
a matrix of 2 times K+1
entries: p.found[s,k]
is the posterior probability of the observations Y_found
under status S_found=s
and when he is assigned to class k
.
References
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2001, DOI: 10.1007/s00180-010-0224-2.
Examples
#data
data(ped.cont)
fam <- ped.cont[,1]
dad <- ped.cont[fam==1,3]
status <- ped.cont[fam==1,6]
y <- ped.cont[fam==1,7:ncol(ped.cont)]
#a founder
found <- which(dad==0)[1]
data(probs)
data(param.cont)
#densities of the observations
fyc <- matrix(1,nrow=nrow(y),ncol=length(probs$p)+1)
fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm,
param.cont,NULL))
#the function
p.post.found(found,status,probs,fyc)
parameters to be used for examples in the case of continuous measurements
Description
means and variance-covariance matrices for each class to be used in examples with continuous measurements.
Usage
data(param.cont)
Details
ped.param
is a list of 2 elements:
mu
a list of
K=3
(the number of latent classes) entries, each represents the means of the measurement multinormal density in the class,sigma
a list of
K=3
entries, each is the variance-covariance matrix of the measurement multinormal density in the class.
The dimension (the number of multinormal measurements) used in the dataset is 4.
See Also
See also init.norm
Examples
data(param.cont)
parameters to be used for examples in the case of discrete or ordinal measurements
Description
list of cumulative logistic coefficients for each measurement and each class to be used in examples for discrete or ordinal models.
Usage
data(param.ordi)
Details
ped.param
is a list of 1 element:
alpha
a list of d=4
(the number of measurements) entries, each is a matrix of K=3
(the number of classes) times S[j]
(the number of possible values of measurement j
), a row alpha[[j]][k,]
contains the logistic coefficients of the measurement j
for class k
.
See Also
See also init.ordi
Examples
data(param.ordi)
pedigrees with continuous data to be used for examples
Description
data set of 48 pedigrees: a matrix of pedigrees data with continuous observations to be used for examples.
Usage
data(ped.cont)
Details
ped
is a matrix of 10 columns:
ped[,1]
family ID,
ped[,2]
subject ID,
ped[,3]
father ID, 0 for founders (i.e. subjects having no parents in the pedigree),
ped[,4]
mother ID, 0 for founders (i.e. subjects having no parents in the pedigree),
ped[,5]
subject sex: 1 male, 2 female,
ped[,6]
symptom status (2: symptomatic, 1: without symptoms, 0: missing),
ped[,7:10]
continuous observations, NA
for
missing and without symptoms,
Examples
data(ped.cont)
pedigrees with discrete or ordinal data to be used for examples
Description
data set of 48 pedigrees: a matrix of pedigrees data with discrete or ordinal observations to be used for examples.
Usage
data(ped.ordi)
Details
ped
is a matrix of 10 columns:
ped[,1]
family ID,
ped[,2]
subject ID,
ped[,3]
father ID, 0 for founders (i.e. subjects having no parents in the pedigree),
ped[,4]
mother ID, 0 for founders (i.e. subjects having no parents in the pedigree),
ped[,5]
subject sex: 1 male, 2 female,
ped[,6]
symptom status (2: symptomatic, 1: without symptoms, 0: missing),
ped[,7:10]
discrete or ordinal observations,
NA
for missing and without symptoms,
Examples
data(ped.ordi)
peeling order of pedigrees and couples in pedigrees
Description
peel
is a list of 48 entries, each gives the peeling order of the pedigrees and lists the couples in the 48 pedigrees of ped.cont
and peed.ordi
.
Usage
data(peel)
Value
For a pedigree f
in the data ped.cont
or ped.ordi
,
peel[[f]]
is a list of three entries:
generation
the number of generations in the pedigree,
peel.connect
a matrix of
generation
rows, each giving the connectors of the generation in the order of peeling,couple
a matrix of two columns, giving the couples in the pedigree.
See Also
See also ped.cont
and ped.ordi
.
Examples
data(peel)
probabilities parameters to be used for examples
Description
a list of probability parameters such as the probability that a founder is assigned to each class, the transition probabilities and the probability that a child is symptomatic.
Usage
data(probs)
Details
probs
a list of probability parameters:
For models with familial dependence:
p
a probability vector, each
p[c]
is the probability that an symptomatic founder is in classc
forc>=1
,p0
the probability that a founder without symptoms is in class 0,
p.trans
an array of dimension
K
timesK+1
timesK+1
, whereK
is the number of latent classes of the model, and is such thatp.trans[c_i,c_1,c_2]
is the conditional probability that a symptomatic individuali
is in classc_i
given that his parents are in classesc_1
andc_2
,p0connect
a vector of length
K
, wherep0connect[c]
is the probability that a connector without symptoms is in class0
, given that one of his parents is in classc>=1
and the other in class 0,p.found
the probability that a founder is symptomatic,
p.child
the probability that a child is symptomatic,
For models without familial dependence, all individuals are independent:
p
a probability vector, each
p[c]
is the probability that an symptomatic individual is in classc
forc>=1
,p0
the probability that an individual without symptoms is in class 0,
p.aff
the probability that an individual is symptomatic,
Examples
data(probs)
performs the upward step of the peeling algorithm of a pedigree
Description
computes the probability of observations below connectors conditionally to their classes given the model parameters. This is an internal function not meant to be called by the user.
Usage
upward(id, dad, mom, status, probs, fyc, peel)
Arguments
id |
individual ID of the pedigree, |
dad |
dad ID, |
mom |
mom ID, |
status |
symptom status: (2: symptomatic, 1: without symptoms, 0: missing), |
probs |
a list of probability parameters of the model, |
fyc |
a matrix of |
peel |
a list of pedigree peeling result containing connectors by peeling order and couples of parents. |
Details
This function computes the probability of observations below connectors conditionally to their classes using the function upward.connect
Value
The function returns a list of 2 elements:
sum.child |
an array of dimension |
p.yF.c |
an array of dimension |
References
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
See Also
See also upward.connect
Examples
#data
data(ped.cont)
data(peel)
fam <- ped.cont[,1]
id <- ped.cont[fam==1,2]
dad <- ped.cont[fam==1,3]
mom <- ped.cont[fam==1,4]
status <- ped.cont[fam==1,6]
y <- ped.cont[fam==1,7:ncol(ped.cont)]
peel <- peel[[1]]
#standardize id to be 1, 2, 3, ...
id.origin <- id
standard <- function(vec) ifelse(vec%in%id.origin,which(id.origin==vec),0)
id <- apply(t(id),2,standard)
dad <- apply(t(dad),2,standard)
mom <- apply(t(mom),2,standard)
peel$couple <- cbind(apply(t(peel$couple[,1]),2,standard),
apply(t(peel$couple[,2]),2,standard))
for(generat in 1:peel$generation)
peel$peel.connect[generat,] <- apply(t(peel$peel.connect[generat,]),2,standard)
#probs and param
data(probs)
data(param.cont)
#densities of the observations
fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1)
fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm,
param.cont,NULL))
#the function
upward(id,dad,mom,status,probs,fyc,peel)
performs the upward step for a connector
Description
computes the probability of the measurements below a connector conditionally to the connector latent class given the model parameters. This is an internal function not meant to be called by the user.
Usage
upward.connect(connect, spouse.connect, children.connect, status,
probs, p.yF.c, fyc, sum.child)
Arguments
connect |
a connector in the pedigree, |
spouse.connect |
spouse of the connector, |
children.connect |
children of the connector, |
status |
a vector of symptom status of the whole pedigree, |
probs |
a list of probability parameters of the model, |
p.yF.c |
an array of dimension |
fyc |
a matrix of |
sum.child |
an array of dimension |
Details
If Y_above(i)
is the observations below connector i
and C_i
is his class, the functions computes P(Y_below(i)|C_i)
.
Value
The function returns a list of 2 elements:
sum.child |
an array of dimension |
p.yF.c |
a array of dimension |
References
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
See Also
See also upward
Examples
#data
data(ped.cont)
data(peel)
fam <- ped.cont[,1]
id <- ped.cont[fam==1,2]
dad <- ped.cont[fam==1,3]
mom <- ped.cont[fam==1,4]
status <- ped.cont[fam==1,6]
y <- ped.cont[fam==1,7:ncol(ped.cont)]
peel <- peel[[1]]
#standardize id to be 1, 2, 3, ...
id.origin <- id
standard <- function(vec) ifelse(vec%in%id.origin,which(id.origin==vec),0)
id <- apply(t(id),2,standard)
dad <- apply(t(dad),2,standard)
mom <- apply(t(mom),2,standard)
peel$couple <- cbind(apply(t(peel$couple[,1]),2,standard),
apply(t(peel$couple[,2]),2,standard))
for(generat in 1:peel$generation)
peel$peel.connect[generat,] <- apply(t(peel$peel.connect[generat,]),2,standard)
#a nuclear family
#connector in the pedigree 1
connect <- peel$peel.connect[1,1]
#soupse of connector connect
spouse.connect <- peel$couple[peel$couple[,1]==connect,2]
#children of connector connect
children.connect <- union(id[dad==connect],id[mom==connect])
#probs and param
data(probs)
data(param.cont)
#probabilitiy of observations above
p.yF.c <- matrix(1,nrow=length(id),ncol=length(probs$p)+1)
#densities of the observations
fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1)
fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm,
param.cont,NULL))
#sums over childs
sum.child <- array(0,c(length(id),length(probs$p)+1,length(probs$p)+1))
#the function
upward.connect(connect,spouse.connect,children.connect,status,probs,
p.yF.c,fyc,sum.child)
performs the computation of triplet and individual weights for a pedigree under familial dependence
Description
computes the triplet and the individual weights of the E step of the EM algorithm for a pedigree in the case of familial dependence. It returns also the overall log-likelihood of the observations. This is an internal function not meant to be called by the user.
Usage
weight.famdep(id, dad, mom, status, probs, fyc, peel)
Arguments
id |
individual ID of the pedigree, |
dad |
dad ID, |
mom |
mom ID, |
status |
symptom status: (2: symptomatic, 1: without symptoms, 0: missing), |
probs |
list of probability parameters of the model, |
fyc |
a matrix of |
peel |
a list of pedigree peeling containing connectors by peeling order and couples of parents |
Details
the function calls the functions upward
and
downward
which perform the required probability
computations by processing the pedigree by nuclear family (or
equivalently by connector) following the peeling order.
Value
the function returns a list of 3 elements:
ww |
triplet weights: an array of |
w |
individual weights: an array of |
ll |
log-likelihood. |
References
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
See Also
See also upward
, downward
, e.step
.
Examples
#data
data(ped.cont)
data(peel)
fam <- ped.cont[,1]
id <- ped.cont[fam==1,2]
dad <- ped.cont[fam==1,3]
mom <- ped.cont[fam==1,4]
status <- ped.cont[fam==1,6]
y <- ped.cont[fam==1,7:ncol(ped.cont)]
peel <- peel[[1]]
#probs and param
data(probs)
data(param.cont)
#densities of the observations
fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1)
fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm,
param.cont,NULL))
#the function
weight.famdep(id,dad,mom,status,probs,fyc,peel)
performs the computation of unnormalized triplet and individuals weights for a nuclear family in the pedigree
Description
the weighting algorithm proceeds by nuclear family, the function weight.nuc
computes the unnormalized triplet and individuals weights for a
nuclear family in the pedigree. This is an internal
function not meant to be called by the user.
Usage
weight.nuc(connect, spouse.connect, children.connect, status,
probs, fyc, p.ybarF.c, ww, w, res.upward)
Arguments
connect |
a connector in the pedigree, |
spouse.connect |
spouse of the connector, |
children.connect |
children of the connector, |
status |
vector of symptom status of the whole pedigree, |
probs |
all probability parameters of the model, |
fyc |
a matrix of |
p.ybarF.c |
a array of dimension |
ww |
unnormalized triplet weights, an array of |
w |
unnormalized individual weights, an array of |
res.upward |
result of the upward step of the weighting algorithm, see |
Details
updated ww
and w
are computed for the current nuclear family.
Value
the function returns a list of 2 elements:
ww |
unnormalized triplet weights, an array of |
w |
unnormalized individual weights, an array of |
References
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
See Also
See also downward
Examples
#data
data(ped.cont)
data(peel)
fam <- ped.cont[,1]
id <- ped.cont[fam==1,2]
dad <- ped.cont[fam==1,3]
mom <- ped.cont[fam==1,4]
status <- ped.cont[fam==1,6]
y <- ped.cont[fam==1,7:ncol(ped.cont)]
peel <- peel[[1]]
#standardize id to be 1, 2, 3, ...
id.origin <- id
standard <- function(vec) ifelse(vec%in%id.origin,which(id.origin==vec),0)
id <- apply(t(id),2,standard)
dad <- apply(t(dad),2,standard)
mom <- apply(t(mom),2,standard)
peel$couple <- cbind(apply(t(peel$couple[,1]),2,standard),
apply(t(peel$couple[,2]),2,standard))
for(generat in 1:peel$generation)
peel$peel.connect[generat,] <- apply(t(peel$peel.connect[generat,]),2,standard)
#the first nuclear family
generat <- peel$generation
connect <- peel$peel.connect[generat,]
connect <- connect[connect>0]
spouse.connect <- peel$couple[peel$couple[,1]==connect,2]
children.connect <- union(id[dad==connect],id[mom==connect])
#probs and param
data(probs)
data(param.cont)
#densities of the observations
fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1)
fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm,
param.cont,NULL))
#triplet and individual weights
ww <- array(0,dim=c(length(id),rep(2,3),rep(length(probs$p)+1,3)))
w <- array(0,dim=c(length(id),2,length(probs$p)+1))
#probability of the observations below
p.ybarF.c <- array(1,dim=c(length(id),2,length(probs$p)+1))
p.ybarF.c[connect,,] <- p.post.found(connect,status,probs,fyc)
#the upward step
res.upward <- upward(id,dad,mom,status,probs,fyc,peel)
#the function
weight.nuc(connect,spouse.connect,children.connect,status,probs,fyc,
p.ybarF.c,ww,w,res.upward)