Type: | Package |
Title: | Gross Flows Estimation under Complex Surveys |
Version: | 0.1.1 |
Author: | Acero William <wfaceror@unal.edu.co>, Gutierrez Andres <andres.gutierrez@un.org>, Trujillo Leonardo <ltrujillo@unal.edu.co> |
Maintainer: | Acero William <wfaceror@unal.edu.co> |
Description: | The philosophy in the package is described in Stasny (1988) <doi:10.2307/1391558> and Gutierrez, A., Trujillo, L. & Silva, N. (2014), <ISSN:1492-0921> to estimate the gross flows under complex surveys using a Markov chain approach with non response. |
Depends: | R (≥ 3.3.1) |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
Repository: | CRAN |
Imports: | dplyr, data.table, TeachingSampling |
NeedsCompilation: | no |
Packaged: | 2023-10-25 10:37:36 UTC; willamacero |
Date/Publication: | 2023-10-25 11:20:07 UTC |
Create a database for gross flows.
Description
Create a database based on \xi
model.
Usage
createBase(x)
Arguments
x |
A matrix that contains information of the observable process. |
Value
createBase
returns data.table,data.frame
that contains the data base based on \xi
model.
Examples
candidates_t0 <- c("Candidate1","Candidate2","Candidate3","Candidate4",
"Candidate5","WhiteVote", "NoVote")
candidates_t1 <- c("Candidate3","Candidate5","WhiteVote", "NoVote")
N <- 100000
nCanT0 <- length(candidates_t0)
nCanT1 <- length(candidates_t1)
eta <- matrix(c(0.10, 0.10, 0.20, 0.17, 0.28, 0.1, 0.05),
byrow = TRUE, nrow = nCanT0)
P <- matrix(c(0.10, 0.60, 0.15, 0.15,
0.30, 0.10, 0.25, 0.35,
0.34, 0.25, 0.16, 0.25,
0.25, 0.05, 0.35, 0.35,
0.10, 0.25, 0.45, 0.20,
0.12, 0.36, 0.22, 0.30,
0.10, 0.15, 0.30, 0.45),
byrow = TRUE, nrow = nCanT0)
citaModel <- matrix(, ncol = nCanT1, nrow = nCanT0)
row.names(citaModel) <- candidates_t0
colnames(citaModel) <- candidates_t1
for(ii in 1:nCanT0){
citaModel[ii,] <- c(rmultinom(1, size = N * eta[ii,], prob = P[ii,]))
}
# # Model I
psiI <- 0.9
rhoRRI <- 0.9
rhoMMI <- 0.5
citaModI <- matrix(nrow = nCanT0 + 1, ncol = nCanT1 + 1)
rownames(citaModI) <- c(candidates_t0, "Non_Resp")
colnames(citaModI) <- c(candidates_t1, "Non_Resp")
citaModI[1:nCanT0, 1:nCanT1] <- P * c(eta) * rhoRRI * psiI
citaModI[(nCanT0 + 1), (nCanT1 + 1)] <- rhoMMI * (1-psiI)
citaModI[1:nCanT0, (nCanT1 + 1)] <- (1-rhoRRI) * psiI * rowSums(P * c(eta))
citaModI[(nCanT0 + 1), 1:nCanT1 ] <- (1-rhoMMI) * (1-psiI) * colSums(P * c(eta))
citaModI <- round_preserve_sum(citaModI * N)
DBmodCitaI <- createBase(citaModI)
DBmodCitaI
Gross Flows estimation
Description
Gross Flows under complex electoral surveys.
Usage
estGF(
sampleBase = NULL,
niter = 100,
model = NULL,
colWeights = NULL,
nonrft = FALSE
)
Arguments
sampleBase |
An object of class "data.frame" containing the information of electoral candidates. The data must contain the samplings weights. |
niter |
The number of iterations for the |
model |
A character indicating the model to be used in estimating estimated gross flows. The models available are: "I","II","III","IV" (see also "Details"). |
colWeights |
The column name containing the sampling weights to be used in the fitting process. |
nonrft |
A logical value indicating a non response for first time. |
Details
The population size N
must satisfy the condition:
N = \sum_{j}\sum_{i} N_{ij} + \sum_{j} C_{j} + \sum_{i} R_{i} + M
where, N_{ij}
is the amount of people interviewed who have classification i
at first time and classification j
at second time, R_{i}
is the amount of people who did not respond at second time, but did at first time, C_{j}
is the amount of people who did not respond at first time, but they did at second time and M
is the number of people who did not respond at any time or could not be reached.
Let \eta_{i}
the initial probability that a person has classification i
in the first time, and let p_{ij}
the vote transition probability for the cell (i,j)
, where \sum_{i} \eta_{i} = 1
and \sum_{j} p_{ij} = 1
.
Thus, four possibles models for the gross flows are given by:
-
Model I: This model assumes that a person's initial probability of being classified as
i
at first time is the same for everyone, that is,\psi(i,j) = \psi
. Besides, transition probabilities between respond and non response not depend of the classification(i,j)
, that is\rho_{MM}(i,j) = \rho_{MM}
and\rho_{RR}(i,j) = \rho_{RR}
. -
Model II: Unlike 'Model I', this model assumes that person initial probability that person has classification
(i,j)
, only depends of his classification at first time, that is\psi(i,j) = \psi(i)
. -
Model III: Unlike 'Model I', this model assumes that transition probabilities between response and non response only depends of probability classification at first time, that is
\rho_{MM}(i,j) = \rho_{MM}(i)
and\rho_{RR}(i,j) = \rho_{RR}(i)
. -
Model IV: Unlike 'Model I', this model assumes that transition probabilities between response and non response only depends of probability classification at second time, that is
\rho_{MM}(i,j) = \rho_{MM}(j)
and\rho_{RR}(i,j) = \rho_{RR}(j)
.
Value
estGF
returns a list containing:
-
Est.CIV: a data.frame containing the gross flows estimation.
-
Params.Model: a list that contains the
\hat{\eta}_{i}
,\hat{p}_{ij}
,\hat{\psi}(i,j)
,\hat{\rho}_{RR}(i,j)
,\hat{\rho}_{MM}(i,j)
parameters for the estimated model. -
Sam.Est: a list containing the sampling estimators
\hat{N}_{ij}
,\hat{R}_{i}
,\hat{C}_{j}
,\hat{M}
,\hat{N}
.
References
Stasny, E. (1987), ‘Some markov-chain models for nonresponse in estimating gross’, Journal of Oficial Statistics 3, pp. 359-373.
Sarndal, C.-E., Swensson, B. & Wretman, J. (1992), Model Assisted Survey Sampling, Springer-Verlag, New York, USA.
Gutierrez, A., Trujillo, L. & Silva, N. (2014), ‘The estimation of gross ows in complex surveys with random nonresponse’, Survey Methodology 40(2), pp. 285-321.
Examples
library(TeachingSampling)
library(data.table)
# Colombia's electoral candidates in 2014
candidates_t0 <- c("Clara","Enrique","Santos","Martha","Zuluaga","WhiteVote", "NoVote")
candidates_t1 <- c("Santos","Zuluaga","WhiteVote", "NoVote")
N <- 100000
nCanT0 <- length(candidates_t0)
nCanT1 <- length(candidates_t1)
# Initial probabilities
eta <- matrix(c(0.10, 0.10, 0.20, 0.17, 0.28, 0.1, 0.05),
byrow = TRUE, nrow = nCanT0)
# Transition probabilities
P <- matrix(c(0.10, 0.60, 0.15, 0.15,
0.30, 0.10, 0.25,0.35,
0.34, 0.25, 0.16, 0.25,
0.25,0.05, 0.35,0.35,
0.10, 0.25, 0.45,0.20,
0.12, 0.36, 0.22, 0.30,
0.10,0.15, 0.30,0.45),
byrow = TRUE, nrow = nCanT0)
citaMod <- matrix(, ncol = nCanT1, nrow = nCanT0)
row.names(citaMod) <- candidates_t0
colnames(citaMod) <- candidates_t1
for(ii in 1:nCanT0){
citaMod[ii,] <- c(rmultinom(1, size = N * eta[ii,], prob = P[ii,]))
}
# # Model I
psiI <- 0.9
rhoRRI <- 0.9
rhoMMI <- 0.5
citaModI <- matrix(nrow = nCanT0 + 1, ncol = nCanT1 + 1)
rownames(citaModI) <- c(candidates_t0, "Non_Resp")
colnames(citaModI) <- c(candidates_t1, "Non_Resp")
citaModI[1:nCanT0, 1:nCanT1] <- P * c(eta) * rhoRRI * psiI
citaModI[(nCanT0 + 1), (nCanT1 + 1)] <- rhoMMI * (1-psiI)
citaModI[1:nCanT0, (nCanT1 + 1)] <- (1-rhoRRI) * psiI * rowSums(P * c(eta))
citaModI[(nCanT0 + 1), 1:nCanT1 ] <- (1-rhoMMI) * (1-psiI) * colSums(P * c(eta))
citaModI <- round_preserve_sum(citaModI * N)
DBcitaModI <- createBase(citaModI)
# Creating auxiliary information
DBcitaModI[,AuxVar := rnorm(nrow(DBcitaModI), mean = 45, sd = 10)]
# Selects a sample with unequal probabilities
res <- S.piPS(n = 3200, as.data.frame(DBcitaModI)[,"AuxVar"])
sam <- res[,1]
pik <- res[,2]
DBcitaModISam <- copy(DBcitaModI[sam,])
DBcitaModISam[,Pik := pik]
# Gross Flows estimation
estima <- estGF(sampleBase = DBcitaModISam, niter = 500, model = "I", colWeights = "Pik")
estima
Gross flows variance estimation.
Description
Gross flows variance estimation according to resampling method (Bootstrap or Jackknife).
Usage
reSamGF(
sampleBase = NULL,
nRepBoot = 500,
model = "I",
niter = 100,
type = "Bootstrap",
colWeights = NULL,
nonrft = FALSE
)
Arguments
sampleBase |
An object of class data.frame or data.table containing the sample selected to estimate the gross flows. |
nRepBoot |
The number of replicates for the bootstrap method. |
model |
A character indicating the model that will be used for estime the gross flows. The available models are: 'I','II','III','IV'. |
niter |
The number of iterations for the |
type |
A character indicating the resampling method ("Bootstrap" or "Jackknife") |
colWeights |
The data colum name containing the sampling weights to be used on the fitting process. |
nonrft |
a logical value indicating the non response for the first time. |
Details
The resampling methods for variance estimation are:
- Bootstrap:
This technique allows to estimate the sampling distribution of almost any statistic by using random sampling methods. Bootstrapping is the practice of estimating properties of an statistic (such as its variance) by measuring those properties from it's approximated sample.
- Jackknife:
The jackknife estimate of a parameter is found by systematically leaving out each observation from a dataset and calculating the estimate and then finding the average of these calculations. Given a sample of size n, the jackknife estimate is found by aggregating the estimates of each n-1-sized sub-sample.
Value
reSamGF
returns a list that contains the variance of each parameter of the selected model.
References
Efron, B. (1979), ‘Computers and the theory of statistics: Thinking the unthinkable’, SIAM review 21(4), pp. 460-480.
Quenouille, M. H. (1949), ‘Problems in plane sampling’, The Annals of Mathematical Statistics pp. 355-375.
Tukey, J. W. (1958), ‘Bias and confidence in not-quite large samples’, Annals of Mathematical Statistics 29, pp. 614.
Examples
library(TeachingSampling)
library(data.table)
# Colombia's electoral candidates in 2014
candidates_t0 <- c("Clara","Enrique","Santos","Martha","Zuluaga","Blanco", "NoVoto")
candidates_t1 <- c("Santos","Zuluaga","Blanco", "NoVoto")
N <- 100000
nCanT0 <- length(candidates_t0)
nCanT1 <- length(candidates_t1)
# Initial probabilities
eta <- matrix(c(0.10, 0.10, 0.20, 0.17, 0.28, 0.1, 0.05),
byrow = TRUE, nrow = nCanT0)
# Transition probabilities
P <- matrix(c(0.10, 0.60, 0.15, 0.15,
0.30, 0.10, 0.25, 0.35,
0.34, 0.25, 0.16, 0.25,
0.25, 0.05, 0.35, 0.35,
0.10, 0.25, 0.45, 0.20,
0.12, 0.36, 0.22, 0.30,
0.10, 0.15, 0.30, 0.45),
byrow = TRUE, nrow = nCanT0)
citaMod <- matrix(, ncol = nCanT1, nrow = nCanT0)
row.names(citaMod) <- candidates_t0
colnames(citaMod) <- candidates_t1
for(ii in 1:nCanT0){
citaMod[ii,] <- c(rmultinom(1, size = N * eta[ii,], prob = P[ii,]))
}
# # Model I
psiI <- 0.9
rhoRRI <- 0.9
rhoMMI <- 0.5
citaModI <- matrix(nrow = nCanT0 + 1, ncol = nCanT1 + 1)
rownames(citaModI) <- c(candidates_t0, "Non_Resp")
colnames(citaModI) <- c(candidates_t1, "Non_Resp")
citaModI[1:nCanT0, 1:nCanT1] <- P * c(eta) * rhoRRI * psiI
citaModI[(nCanT0 + 1), (nCanT1 + 1)] <- rhoMMI * (1-psiI)
citaModI[1:nCanT0, (nCanT1 + 1)] <- (1-rhoRRI) * psiI * rowSums(P * c(eta))
citaModI[(nCanT0 + 1), 1:nCanT1 ] <- (1-rhoMMI) * (1-psiI) * colSums(P * c(eta))
citaModI <- round_preserve_sum(citaModI * N)
DBcitaModI <- createBase(citaModI)
# Creating auxiliary information
DBcitaModI[,AuxVar := rnorm(nrow(DBcitaModI), mean = 45, sd = 10)]
# Selects a sample with unequal probabilities
res <- S.piPS(n = 1200, as.data.frame(DBcitaModI)[,"AuxVar"])
sam <- res[,1]
pik <- res[,2]
DBcitaModISam <- copy(DBcitaModI[sam,])
DBcitaModISam[,Pik := pik]
# Gross flows estimation
estima <- estGF(sampleBase = DBcitaModISam, niter = 500, model = "II", colWeights = "Pik")
# gross flows variance estimation
varEstima <- reSamGF(sampleBase = DBcitaModISam, type = "Bootstrap", nRepBoot = 100,
model = "II", niter = 101, colWeights = "Pik")
varEstima
Round preserve sum.
Description
Rounds a vector of numbers while preserving the sum of them.
Usage
round_preserve_sum(x, digits = 0)
Arguments
x |
A numeric vector. |
digits |
The number of digits to take in account in the rounding process. |
Value
round_preserve_sum
returns y
with round vector.
Source
https://www.r-bloggers.com/2016/07/round-values-while-preserve-their-rounded-sum-in-r/ and https://stackoverflow.com/questions/32544646/round-vector-of-numerics-to-integer-while-preserving-their-sum
Examples
sum(c(0.333, 0.333, 0.334))
round(c(0.333, 0.333, 0.334), 2)
sum(round(c(0.333, 0.333, 0.334), 2))
round_preserve_sum(c(0.333, 0.333, 0.334), 2)
sum(round_preserve_sum(c(0.333, 0.333, 0.334), 2))