Title: | Survey Standard Error Estimation for Cumulated Estimates and their Differences in Complex Panel Designs |
---|---|
Description: | Calculate point estimates and their standard errors in complex household surveys using bootstrap replicates. Bootstrapping considers survey design with a rotating panel. A comprehensive description of the methodology can be found under <https://statistikat.github.io/surveysd/articles/methodology.html>. |
Authors: | Johannes Gussenbauer [aut, cre], Alexander Kowarik [aut] , Gregor de Cillia [aut], Matthias Till [ctb] |
Maintainer: | Johannes Gussenbauer <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.3.3 |
Built: | 2025-01-31 05:55:39 UTC |
Source: | https://github.com/statistikat/surveysd |
Calculate point estimates as well as standard errors of variables in surveys. Standard errors are estimated using bootstrap weights (see draw.bootstrap and recalib). In addition the standard error of an estimate can be calcualted using the survey data for 3 or more consecutive periods, which results in a reduction of the standard error.
calc.stError( dat, weights = attr(dat, "weights"), b.weights = attr(dat, "b.rep"), period = attr(dat, "period"), var = NULL, fun = weightedRatio, national = FALSE, group = NULL, fun.adjust.var = NULL, adjust.var = NULL, period.diff = NULL, period.mean = NULL, bias = FALSE, size.limit = 20, cv.limit = 10, p = NULL, add.arg = NULL, new_method = FALSE )
calc.stError( dat, weights = attr(dat, "weights"), b.weights = attr(dat, "b.rep"), period = attr(dat, "period"), var = NULL, fun = weightedRatio, national = FALSE, group = NULL, fun.adjust.var = NULL, adjust.var = NULL, period.diff = NULL, period.mean = NULL, bias = FALSE, size.limit = 20, cv.limit = 10, p = NULL, add.arg = NULL, new_method = FALSE )
dat |
either data.frame or data.table containing the survey data. Surveys can be a panel survey or rotating panel survey, but does not need to be. For rotating panel survey bootstrap weights can be created using draw.bootstrap and recalib. |
weights |
character specifying the name of the column in |
b.weights |
character vector specifying the names of the columns in
|
period |
character specifying the name of the column in |
var |
character vector containing variable names in |
fun |
function which will be applied on |
national |
boolean, if TRUE point estimates resulting from fun will be divided by the point estimate at the national level. |
group |
character vectors or list of character vectors containig
variables in |
fun.adjust.var |
can be either |
adjust.var |
can be either |
period.diff |
character vectors, defining periods for which the
differences in the point estimate as well it's standard error is
calculated. Each entry must have the form of |
period.mean |
odd integer, defining the range of periods over which the sample mean of point estimates is additionally calcualted. |
bias |
boolean, if |
size.limit |
integer defining a lower bound on the number of
observations on |
cv.limit |
non-negativ value defining a upper bound for the standard
error in relation to the point estimate. If this relation exceed
|
p |
numeric vector containing values between 0 and 1. Defines which
quantiles for the distribution of |
add.arg |
additional arguments which will be passed to fun. Can be either a named list or vector. The names of the object correspond to the function arguments and the values to column names in dat, see also examples. |
calc.stError
takes survey data (dat
) and returns point estimates
as well as their standard Errors defined by fun
and var
for each sample
period in dat
. dat
must be household data where household members
correspond to multiple rows with the same household identifier. The data
should at least contain the following columns:
Column indicating the sample period;
Column indicating the household ID;
Column containing the household sample weights;
Columns which contain the bootstrap weights (see output of recalib);
Columns listed in var
as well as in group
For each variable in var
as well as sample period the function fun
is
applied using the original as well as the bootstrap sample weights.
The point estimate is then selected as the result of fun
when using the
original sample weights and it's standard error is estimated with the result
of fun
using the bootstrap sample weights. fun
can be any function which returns a double or integer and uses sample
weights as it's second argument. The predifined options are weightedRatio
and weightedSum
.
For the option weightedRatio
a weighted ratio (in \
calculated for var
equal to 1, e.g
sum(weight[var==1])/sum(weight[!is.na(var)])*100
.
Additionally using the option national=TRUE
the weighted ratio (in \
divided by the weighted ratio at the national level for each period
.
If group
is not NULL
but a vector of variables from dat
then fun
is
applied on each subset of dat
defined by all combinations of values in
group
.
For instance if group = "sex"
with "sex" having the values "Male" and
"Female" in dat
the point estimate and standard error is calculated on the
subsets of dat
with only "Male" or "Female" value for "sex". This is done
for each value of period
. For variables in group
which have NA
s in
dat
the rows containing the missings will be discarded.
When group
is a list of character vectors, subsets of dat
and the
following estimation of the point estimate, including the estimate for the
standard error, are calculated for each list entry.
The optional parameters fun.adjust.var
and adjust.var
can be used if the
values in var
are dependent on the weights
. As is for instance the case
for the poverty thershhold calculated from EU-SILC.
In such a case an additional function can be supplied using fun.adjust.var
as well as its first argument adjust.var
, which needs to be part of the
data set dat
. Then, before applying fun
on variable var
for all period
and groups, the function fun.adjust.var
is applied to
adjust.var
using each of the bootstrap weights seperately (NOTE: weight is
used as the second argument of fun.adjust.var
).
Thus creating i=1,...,length(b.weights)
additional variables.
For applying fun
on var
the estimates for the bootstrap replicate will
now use each of the corresponding new additional variables. So instead of
the function fun
will be applied in the way
where var.1
, var.2
, ...
correspond to the estimates resulting from
fun.adjust.var
and adjust.var
.
NOTE: This procedure is especially usefull if the var
is dependent on
weights
and fun
is applied on subgroups of the data set. Then it is not
possible to capture this procedure with fun
and var
, see examples for a
more hands on explanation.
When defining period.diff
the difference of point estimates between periods
as well their standard errors are calculated.
The entries in period.diff
must have the form of "period1 - period2"
which means that the results of the point estimates for period2
will be
substracted from the results of the point estimates for period1
.
Specifying period.mean
leads to an improvement in standard error by
averaging the results for the point estimates, using the bootstrap weights,
over period.mean
periods.
Setting, for instance, period.mean = 3
the results in averaging these
results over each consecutive set of 3 periods.
Estimating the standard error over these averages gives an improved estimate
of the standard error for the central period, which was used for
averaging.
The averaging of the results is also applied in differences of point
estimates. For instance defining period.diff = "2015-2009"
and
period.mean = 3
the differences in point estimates of 2015 and 2009, 2016 and 2010 as well as
2014 and 2008 are calcualated and finally the average over these 3
differences is calculated.
The periods set in period.diff
are always used as the middle periods around
which the mean over period.mean
years is build.
Setting bias
to TRUE
returns the calculation of a mean over the results
from the bootstrap replicates. In the output the corresponding columns is
labeled _mean at the end.
If fun
needs more arguments they can be supplied in add.arg
. This can
either be a named list or vector.
The parameter size.limit
indicates a lower bound of the sample size for
subsets in dat
created by group
. If the sample size of a subset falls
below size.limit
a warning will be displayed.
In addition all subsets for which this is the case can be selected from the
output of calc.stError
with $smallGroups
.
With the parameter cv.limit
one can set an upper bound on the coefficient
of variantion. Estimates which exceed this bound are flagged with TRUE
and
are available in the function output with $cvHigh
.
cv.limit
must be a positive integer and is treated internally as \
for cv.limit=1
the estimate will be flagged if the coefficient of
variantion exceeds 1\
When specifying period.mean
, the decrease in standard error for choosing
this method is internally calcualted and a rough estimate for an implied
increase in sample size is available in the output with $stEDecrease
.
The rough estimate for the increase in sample size uses the fact that for a
sample of size the sample estimate for the standard error of most
point estimates converges with a factor
against the true
standard error
.
Returns a list containing:
Estimates
: data.table containing period differences and/or k period
averages for estimates of
fun
applied to var
as well as the corresponding standard errors, which
are calculated using the bootstrap weights. In addition the sample size,
n
, and poplutaion size for each group is added to the output.
smallGroups
: data.table containing groups for which the number of
observation falls below size.limit
.
cvHigh
: data.table containing a boolean variable which indicates for each
estimate if the estimated standard error exceeds cv.limit
.
stEDecrease
: data.table indicating for each estimate the theoretical
increase in sample size which is gained when averaging over k periods. Only
returned if period.mean
is not NULL
.
Johannes Gussenbauer, Alexander Kowarik, Statistics Austria
# Import data and calibrate library(surveysd) library(data.table) setDTthreads(1) set.seed(1234) eusilc <- demo.eusilc(n = 4,prettyNames = TRUE) dat_boot <- draw.bootstrap(eusilc, REP = 3, hid = "hid", weights = "pWeight", strata = "region", period = "year") dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region") # estimate weightedRatio for povertyRisk per period err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio) err.est$Estimates # calculate weightedRatio for povertyRisk and fraction of one-person # households per period dat_boot_calib[, onePerson := .N == 1, by = .(year, hid)] err.est <- calc.stError(dat_boot_calib, var = c("povertyRisk", "onePerson"), fun = weightedRatio) err.est$Estimates # estimate weightedRatio for povertyRisk per period and gender and # period x region x gender group <- list("gender", c("gender", "region")) err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = group) err.est$Estimates # use average over 3 periods for standard error estimation # and calculate estimate for difference of # period 2011 and 2012 inclulding standard errors period.diff <- c("2012-2011") err.est <- calc.stError( dat_boot_calib, var = "povertyRisk", fun = weightedRatio, period.diff = period.diff, # <- take difference of periods 2012 and 2011 period.mean = 3) # <- average over 3 periods err.est$Estimates # for more examples see https://statistikat.github.io/surveysd/articles/error_estimation.html
# Import data and calibrate library(surveysd) library(data.table) setDTthreads(1) set.seed(1234) eusilc <- demo.eusilc(n = 4,prettyNames = TRUE) dat_boot <- draw.bootstrap(eusilc, REP = 3, hid = "hid", weights = "pWeight", strata = "region", period = "year") dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region") # estimate weightedRatio for povertyRisk per period err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio) err.est$Estimates # calculate weightedRatio for povertyRisk and fraction of one-person # households per period dat_boot_calib[, onePerson := .N == 1, by = .(year, hid)] err.est <- calc.stError(dat_boot_calib, var = c("povertyRisk", "onePerson"), fun = weightedRatio) err.est$Estimates # estimate weightedRatio for povertyRisk per period and gender and # period x region x gender group <- list("gender", c("gender", "region")) err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = group) err.est$Estimates # use average over 3 periods for standard error estimation # and calculate estimate for difference of # period 2011 and 2012 inclulding standard errors period.diff <- c("2012-2011") err.est <- calc.stError( dat_boot_calib, var = "povertyRisk", fun = weightedRatio, period.diff = period.diff, # <- take difference of periods 2012 and 2011 period.mean = 3) # <- average over 3 periods err.est$Estimates # for more examples see https://statistikat.github.io/surveysd/articles/error_estimation.html
Customize weight-updating within factor levels in case of numerical calibration. The functions described here serve as inputs for ipf.
computeLinear(curValue, target, x, w, boundLinear = 10) computeLinearG1_old(curValue, target, x, w, boundLinear = 10) computeLinearG1(curValue, target, x, w, boundLinear = 10) computeFrac(curValue, target, x, w)
computeLinear(curValue, target, x, w, boundLinear = 10) computeLinearG1_old(curValue, target, x, w, boundLinear = 10) computeLinearG1(curValue, target, x, w, boundLinear = 10) computeFrac(curValue, target, x, w)
curValue |
Current summed up value. Same as |
target |
Target value. An element of |
x |
Vector of numeric values to be calibrated against |
w |
Vector of weights |
boundLinear |
The output |
computeFrac
provides the "standard" IPU updating scheme given as
which means that each weight inside the level will be multtiplied by the same
factor when doing the actual update step (w := f*w
). computeLinear
on the
other hand calculates f
as
where a
and b
are chosen, so f satisfies the following two equations.
computeLinearG1
calculates f
in the same way as computeLinear
, but if
f_i*w_i<1
f_i
will be set to 1/w_i
.
A weight multiplier f
These functions calculate the arithmetic and geometric mean of the weight for each class. geometric_mean
and
arithmetic_mean
return a numeric
vector of the same length as w
which stores the averaged weight for each
observation. geometric_mean_reference
returns the same value by reference, i.e. the input value w
gets
overwritten by the updated weights. See examples.
geometric_mean_reference(w, classes)
geometric_mean_reference(w, classes)
w |
An numeric vector. All entries should be positive. |
classes |
A factor variable. Must have the same length as |
## Not run: ## create random data nobs <- 10 classLabels <- letters[1:3] dat = data.frame( weight = exp(rnorm(nobs)), household = factor(sample(classLabels, nobs, replace = TRUE)) ) dat ## calculate weights with geometric_mean geom_weight <- geometric_mean(dat$weight, dat$household) cbind(dat, geom_weight) ## calculate weights with arithmetic_mean arith_weight <- arithmetic_mean(dat$weight, dat$household) cbind(dat, arith_weight) ## calculate weights "by reference" geometric_mean_reference(dat$weight, dat$household) dat ## End(Not run)
## Not run: ## create random data nobs <- 10 classLabels <- letters[1:3] dat = data.frame( weight = exp(rnorm(nobs)), household = factor(sample(classLabels, nobs, replace = TRUE)) ) dat ## calculate weights with geometric_mean geom_weight <- geometric_mean(dat$weight, dat$household) cbind(dat, geom_weight) ## calculate weights with arithmetic_mean arith_weight <- arithmetic_mean(dat$weight, dat$household) cbind(dat, arith_weight) ## calculate weights "by reference" geometric_mean_reference(dat$weight, dat$household) dat ## End(Not run)
Create a dummy dataset to be used for demonstrating the functionalities of
the surveysd
package based on laeken::eusilc. Please refer to the
documentation page of the original data for details about the variables.
demo.eusilc(n = 8, prettyNames = FALSE)
demo.eusilc(n = 8, prettyNames = FALSE)
n |
Number of years to generate. Should be at least 1 |
prettyNames |
Create easy-to-read names for certain variables. Recommended for demonstration purposes. Otherwise, use the original codes documented in laeken::eusilc. |
If prettyNames
is TRUE
, the following variables will be available in an
easy-to-read manner.
hid
Household id. Consistent with respect to the reference period
(year
)
hsize
Size of the household. derived from hid
and period
region
Federal state of austria where the household is located
pid
Personal id. Consistent with respect to the reference period (year
)
age
Age-class of the respondent
gender
A persons gender ("male"
, "Female"
)
ecoStat
Ecnomic status
("part time"
, "full time"
, "unemployed"
, ...)
citizenship
Citizenship ("AT"
, "EU"
, "other"
)
pWeight
Personal sample weight inside the reference period
year
. Simulated reference period
povertyRisk
. Logical variable determining whether a respondent is at risk
of poverty
demo.eusilc(n = 1, prettyNames = TRUE)[, c(1:8, 26, 28:30)]
demo.eusilc(n = 1, prettyNames = TRUE)[, c(1:8, 26, 28:30)]
Draw bootstrap replicates from survey data with rotating panel design. Survey information, like ID, sample weights, strata and population totals per strata, should be specified to ensure meaningfull survey bootstraping.
draw.bootstrap( dat, REP = 1000, hid = NULL, weights, period = NULL, strata = NULL, cluster = NULL, totals = NULL, single.PSU = c("merge", "mean"), boot.names = NULL, split = FALSE, pid = NULL, new.method = TRUE )
draw.bootstrap( dat, REP = 1000, hid = NULL, weights, period = NULL, strata = NULL, cluster = NULL, totals = NULL, single.PSU = c("merge", "mean"), boot.names = NULL, split = FALSE, pid = NULL, new.method = TRUE )
dat |
either data.frame or data.table containing the survey data with rotating panel design. |
REP |
integer indicating the number of bootstrap replicates. |
hid |
character specifying the name of the column in |
weights |
character specifying the name of the column in |
period |
character specifying the name of the column in |
strata |
character vector specifying the name(s) of the column in |
cluster |
character vector specifying cluster in the data. If not
already specified in |
totals |
character specifying the name of the column in |
single.PSU |
either "merge" or "mean" defining how single PSUs need to
be dealt with. For |
boot.names |
character indicating the leading string of the column names for each bootstrap replica. If NULL defaults to "w". |
split |
logical, if TRUE split households are considered using |
pid |
column in |
new.method |
if |
draw.bootstrap
takes dat
and draws REP
bootstrap replicates
from it.
dat
must be household data where household members correspond to multiple
rows with the same household
identifier. For most practical applications, the following columns should be
available in the dataset
and passed via the corresponding parameters:
Column indicating the sample period (parameter period
).
Column indicating the household ID (parameter hid
)
Column containing the household sample weights (parameter weights
);
Columns by which population was stratified during the sampling process
(parameter: strata
).
For single stage sampling design a column the argument totals
is optional,
meaning that a column of the number of PSUs at the first stage does not need
to be supplied.
For this case the number of PSUs is calculated and added to dat
using
strata
and weights
. By setting cluster
to NULL single stage sampling
design is always assumed and
if strata
contains of multiple column names the combination of all those
column names will be used for stratification.
In the case of multi stage sampling design the argument totals
needs to be
specified and needs to have the same number of arguments as strata
.
If cluster
is NULL
or does not contain hid
at the last stage, hid
will automatically be used as the final cluster. If, besides hid
,
clustering in additional stages is specified the number of column names in
strata
and cluster
(including hid
) must be the same. If for any stage
there was no clustering or stratification one can set "1" or "I" for this
stage.
For example strata=c("REGION","I"),cluster=c("MUNICIPALITY","HID")
would
speficy a 2 stage sampling design where at the first stage the municipalities
where drawn stratified by regions
and at the 2nd stage housholds are drawn in each municipality without
stratification.
Bootstrap replicates are drawn for each survey period consecutively (period
) using the
function rescaled.bootstrap.
Bootstrap replicates are drawn consistently in the way that in each period
and
sampling stage always clusters are selected in each strata.
This ensures that the bootstrap replicates follow the same logic as the sampled households, making the bootstrap replicates more comparable to the actual sample units.
If split
ist set to TRUE
and pid
is specified, the bootstrap replicates
are carried forward using the personal identifiers instead of the household
identifier.
This takes into account the issue of a houshold splitting up.
Any person in this new split household will get the same bootstrap replicate
as the person that has come from an other household in the survey.
People who enter already existing households will also get the same bootstrap
replicate as the other households members had in the previous periods.
the survey data with the number of REP bootstrap replicates added as columns.
Returns a data.table containing the original data as well as the
number of REP
columns containing the bootstrap replicates for each
repetition.
The columns of the bootstrap replicates are by default labeled "wNumber"
where Number goes from 1 to REP
. If the column names of the bootstrap
replicates should start with a different character or string the parameter
boot.names
can be used.
Johannes Gussenbauer, Alexander Kowarik, Statistics Austria
data.table
for more information on
data.table objects.
library(surveysd) library(data.table) setDTthreads(1) set.seed(1234) eusilc <- demo.eusilc(n = 3, prettyNames = TRUE) ## draw sample without stratification or clustering dat_boot <- draw.bootstrap(eusilc, REP = 1, weights = "pWeight", period = "year") ## use stratification w.r.t. region and clustering w.r.t. households dat_boot <- draw.bootstrap( eusilc, REP = 1, hid = "hid", weights = "pWeight", strata = "region", period = "year") ## use multi-level clustering dat_boot <- draw.bootstrap( eusilc, REP = 1, hid = "hid", weights = "pWeight", strata = c("region", "hsize"), period = "year") # create spit households eusilc[, pidsplit := pid] year <- eusilc[, unique(year)] year <- year[-1] leaf_out <- c() for(y in year) { split.person <- eusilc[ year == (y-1) & !duplicated(hid) & !(hid %in% leaf_out), sample(pid, 20) ] overwrite.person <- eusilc[ (year == (y)) & !duplicated(hid) & !(hid %in% leaf_out), .(pid = sample(pid, 20)) ] overwrite.person[, c("pidsplit", "year_curr") := .(split.person, y)] eusilc[overwrite.person, pidsplit := i.pidsplit, on = .(pid, year >= year_curr)] leaf_out <- c(leaf_out, eusilc[pid %in% c(overwrite.person$pid, overwrite.person$pidsplit), unique(hid)]) } dat_boot <- draw.bootstrap( eusilc, REP = 1, hid = "hid", weights = "pWeight", strata = c("region", "hsize"), period = "year", split = TRUE, pid = "pidsplit") # split households were considered e.g. household and # split household were both selected or not selected dat_boot[, data.table::uniqueN(w1), by = pidsplit][V1 > 1]
library(surveysd) library(data.table) setDTthreads(1) set.seed(1234) eusilc <- demo.eusilc(n = 3, prettyNames = TRUE) ## draw sample without stratification or clustering dat_boot <- draw.bootstrap(eusilc, REP = 1, weights = "pWeight", period = "year") ## use stratification w.r.t. region and clustering w.r.t. households dat_boot <- draw.bootstrap( eusilc, REP = 1, hid = "hid", weights = "pWeight", strata = "region", period = "year") ## use multi-level clustering dat_boot <- draw.bootstrap( eusilc, REP = 1, hid = "hid", weights = "pWeight", strata = c("region", "hsize"), period = "year") # create spit households eusilc[, pidsplit := pid] year <- eusilc[, unique(year)] year <- year[-1] leaf_out <- c() for(y in year) { split.person <- eusilc[ year == (y-1) & !duplicated(hid) & !(hid %in% leaf_out), sample(pid, 20) ] overwrite.person <- eusilc[ (year == (y)) & !duplicated(hid) & !(hid %in% leaf_out), .(pid = sample(pid, 20)) ] overwrite.person[, c("pidsplit", "year_curr") := .(split.person, y)] eusilc[overwrite.person, pidsplit := i.pidsplit, on = .(pid, year >= year_curr)] leaf_out <- c(leaf_out, eusilc[pid %in% c(overwrite.person$pid, overwrite.person$pidsplit), unique(hid)]) } dat_boot <- draw.bootstrap( eusilc, REP = 1, hid = "hid", weights = "pWeight", strata = c("region", "hsize"), period = "year", split = TRUE, pid = "pidsplit") # split households were considered e.g. household and # split household were both selected or not selected dat_boot[, data.table::uniqueN(w1), by = pidsplit][V1 > 1]
Generating a new houshold ID for survey data using a houshold ID and a personal ID. For surveys with rotating panel design containing housholds, houshold members can move from an existing household to a new one, that was not originally in the sample. This leads to the creation of so called split households. Using a peronal ID (that stays fixed over the whole survey), an indicator for different time steps and a houshold ID, a new houshold ID is assigned to the original and the split household.
generate.HHID(dat, period = "RB010", pid = "RB030", hid = "DB030")
generate.HHID(dat, period = "RB010", pid = "RB030", hid = "DB030")
dat |
data table of data frame containing the survey data |
period |
column name of |
pid |
column name of |
hid |
column name of |
the survey data dat
as data.table object containing a new and
an old household ID. The new household ID which considers the split
households is now named hid
and the original household ID has a
trailing "_orig".
## Not run: library(surveysd) library(laeken) library(data.table) eusilc <- surveysd:::demo.eusilc(n=4) # create spit households eusilc[,rb030split:=rb030] year <- eusilc[,unique(year)] year <- year[-1] leaf_out <- c() for(y in year) { split.person <- eusilc[year==(y-1)&!duplicated(db030)&!db030%in%leaf_out, sample(rb030,20)] overwrite.person <- eusilc[year==(y)&!duplicated(db030)&!db030%in%leaf_out, .(rb030=sample(rb030,20))] overwrite.person[,c("rb030split","year_curr"):=.(split.person,y)] eusilc[overwrite.person, rb030split:=i.rb030split,on=.(rb030,year>=year_curr)] leaf_out <- c( leaf_out, eusilc[rb030%in%c(overwrite.person$rb030,overwrite.person$rb030split), unique(db030)]) } # pid which are in split households eusilc[,.(uniqueN(db030)),by=list(rb030split)][V1>1] eusilc.new <- generate.HHID(eusilc, period = "year", pid = "rb030split", hid = "db030") # no longer any split households in the data eusilc.new[,.(uniqueN(db030)),by=list(rb030split)][V1>1] ## End(Not run)
## Not run: library(surveysd) library(laeken) library(data.table) eusilc <- surveysd:::demo.eusilc(n=4) # create spit households eusilc[,rb030split:=rb030] year <- eusilc[,unique(year)] year <- year[-1] leaf_out <- c() for(y in year) { split.person <- eusilc[year==(y-1)&!duplicated(db030)&!db030%in%leaf_out, sample(rb030,20)] overwrite.person <- eusilc[year==(y)&!duplicated(db030)&!db030%in%leaf_out, .(rb030=sample(rb030,20))] overwrite.person[,c("rb030split","year_curr"):=.(split.person,y)] eusilc[overwrite.person, rb030split:=i.rb030split,on=.(rb030,year>=year_curr)] leaf_out <- c( leaf_out, eusilc[rb030%in%c(overwrite.person$rb030,overwrite.person$rb030split), unique(db030)]) } # pid which are in split households eusilc[,.(uniqueN(db030)),by=list(rb030split)][V1>1] eusilc.new <- generate.HHID(eusilc, period = "year", pid = "rb030split", hid = "db030") # no longer any split households in the data eusilc.new[,.(uniqueN(db030)),by=list(rb030split)][V1>1] ## End(Not run)
Adjust sampling weights to given totals based on household-level and/or individual level constraints.
ipf( dat, hid = NULL, conP = NULL, conH = NULL, epsP = 1e-06, epsH = 0.01, verbose = FALSE, w = NULL, bound = 4, maxIter = 200, meanHH = TRUE, allPthenH = TRUE, returnNA = TRUE, looseH = FALSE, numericalWeighting = computeLinear, check_hh_vars = TRUE, conversion_messages = FALSE, nameCalibWeight = "calibWeight", minMaxTrim = NULL, print_every_n = 100 )
ipf( dat, hid = NULL, conP = NULL, conH = NULL, epsP = 1e-06, epsH = 0.01, verbose = FALSE, w = NULL, bound = 4, maxIter = 200, meanHH = TRUE, allPthenH = TRUE, returnNA = TRUE, looseH = FALSE, numericalWeighting = computeLinear, check_hh_vars = TRUE, conversion_messages = FALSE, nameCalibWeight = "calibWeight", minMaxTrim = NULL, print_every_n = 100 )
dat |
a |
hid |
name of the column containing the household-ids within |
conP |
list or (partly) named list defining the constraints on person
level. The list elements are contingency tables in array representation
with dimnames corresponding to the names of the relevant calibration
variables in |
conH |
list or (partly) named list defining the constraints on
household level. The list elements are contingency tables in array
representation with dimnames corresponding to the names of the relevant
calibration variables in |
epsP |
numeric value or list (of numeric values and/or arrays)
specifying the convergence limit(s) for |
epsH |
numeric value or list (of numeric values and/or arrays)
specifying the convergence limit(s) for |
verbose |
if TRUE, some progress information will be printed. |
w |
name if the column containing the base weights within |
bound |
numeric value specifying the multiplier for determining the
weight trimming boundary if the change of the base weights should be
restricted, i.e. if the weights should stay between 1/ |
maxIter |
numeric value specifying the maximum number of iterations that should be performed. |
meanHH |
if TRUE, every person in a household is assigned the mean of
the person weights corresponding to the household. If |
allPthenH |
if TRUE, all the person level calibration steps are
performed before the houshold level calibration steps (and |
returnNA |
if TRUE, the calibrated weight will be set to NA in case of no convergence. |
looseH |
if FALSE, the actual constraints |
numericalWeighting |
|
check_hh_vars |
If |
conversion_messages |
show a message, if inputs need to be reformatted. This can be useful for speed optimizations if ipf is called several times with similar inputs (for example bootstrapping) |
nameCalibWeight |
character defining the name of the variable for the newly generated calibrated weight. |
minMaxTrim |
numeric vector of length2, first element a minimum value for weights to be trimmed to, second element a maximum value for weights to be trimmed to. |
print_every_n |
number of interation steps after which a summary table
is printed. The summary table shows all constraints which are not yet
reached according to |
This function implements the weighting procedure described
here: doi:10.17713/ajs.v45i3.120.
Usage examples can be found in the corresponding vignette
(vignette("ipf")
).
conP
and conH
are contingency tables, which can be created with xtabs
.
The dimnames
of those tables should match the names and levels of the
corresponding columns in dat
.
maxIter
, epsP
and epsH
are the stopping criteria. epsP
and epsH
describe relative tolerances in the sense that
will be used as convergence criterium. Here i is the iteration step and wi is the weight of a specific person at step i.
The algorithm
performs best if all varables occuring in the constraints (conP
and conH
)
as well as the household variable are coded as factor
-columns in dat
.
Otherwise, conversions will be necessary which can be monitored with the
conversion_messages
argument. Setting check_hh_vars
to FALSE
can also
incease the performance of the scheme.
The function will return the input data dat
with the calibrated
weights calibWeight
as an additional column as well as attributes. If no
convergence has been reached in maxIter
steps, and returnNA
is TRUE
(the default), the column calibWeights
will only consist of NA
s. The
attributes of the table are attributes derived from the data.table
class
as well as the following.
converged |
Did the algorithm converge in maxIter steps? |
iterations |
The number of iterations performed. |
conP , conH , epsP , epsH |
See Arguments. |
conP_adj , conH_adj |
Adjusted versions of conP and conH |
formP , formH |
Formulas that were used to calculate conP_adj and
conH_adj based on the output table.
|
Alexander Kowarik, Gregor de Cillia
## Not run: # load data eusilc <- demo.eusilc(n = 1, prettyNames = TRUE) # personal constraints conP1 <- xtabs(pWeight ~ age, data = eusilc) conP2 <- xtabs(pWeight ~ gender + region, data = eusilc) conP3 <- xtabs(pWeight*eqIncome ~ gender, data = eusilc) # household constraints conH1 <- xtabs(pWeight ~ hsize + region, data = eusilc[!duplicated(hid)]) # simple usage ------------------------------------------ calibweights1 <- ipf( eusilc, conP = list(conP1, conP2, eqIncome = conP3), bound = NULL, verbose = TRUE ) # compare personal weight with the calibweigth calibweights1[, .(hid, pWeight, calibWeight)] # advanced usage ---------------------------------------- # use an array of tolerances epsH1 <- conH1 epsH1[1:4, ] <- 0.005 epsH1[5, ] <- 0.2 # create an initial weight for the calibration eusilc[, regSamp := .N, by = region] eusilc[, regPop := sum(pWeight), by = region] eusilc[, baseWeight := regPop/regSamp] calibweights2 <- ipf( eusilc, conP = list(conP1, conP2), conH = list(conH1), epsP = 1e-6, epsH = list(epsH1), bound = 4, w = "baseWeight", verbose = TRUE ) # show an adjusted version of conP and the original attr(calibweights2, "conP_adj") attr(calibweights2, "conP") ## End(Not run)
## Not run: # load data eusilc <- demo.eusilc(n = 1, prettyNames = TRUE) # personal constraints conP1 <- xtabs(pWeight ~ age, data = eusilc) conP2 <- xtabs(pWeight ~ gender + region, data = eusilc) conP3 <- xtabs(pWeight*eqIncome ~ gender, data = eusilc) # household constraints conH1 <- xtabs(pWeight ~ hsize + region, data = eusilc[!duplicated(hid)]) # simple usage ------------------------------------------ calibweights1 <- ipf( eusilc, conP = list(conP1, conP2, eqIncome = conP3), bound = NULL, verbose = TRUE ) # compare personal weight with the calibweigth calibweights1[, .(hid, pWeight, calibWeight)] # advanced usage ---------------------------------------- # use an array of tolerances epsH1 <- conH1 epsH1[1:4, ] <- 0.005 epsH1[5, ] <- 0.2 # create an initial weight for the calibration eusilc[, regSamp := .N, by = region] eusilc[, regPop := sum(pWeight), by = region] eusilc[, baseWeight := regPop/regSamp] calibweights2 <- ipf( eusilc, conP = list(conP1, conP2), conH = list(conH1), epsP = 1e-6, epsH = list(epsH1), bound = 4, w = "baseWeight", verbose = TRUE ) # show an adjusted version of conP and the original attr(calibweights2, "conP_adj") attr(calibweights2, "conP") ## End(Not run)
C++ routines to invoke a single iteration of the Iterative proportional updating (IPU) scheme. Targets and classes
are assumed to be one dimensional in the ipf_step
functions. combine_factors
aggregates several vectors of
type factor into a single one to allow multidimensional ipu-steps. See examples.
ipf_step_ref(w, classes, targets) ipf_step(w, classes, targets) ipf_step_f(w, classes, targets) combine_factors(dat, targets)
ipf_step_ref(w, classes, targets) ipf_step(w, classes, targets) ipf_step_f(w, classes, targets) combine_factors(dat, targets)
w |
a numeric vector of weights. All entries should be positive. |
classes |
a factor variable. Must have the same length as |
targets |
key figure to target with the ipu scheme. A numeric verctor of the same length as |
dat |
a |
ipf_step
returns the adjusted weights. ipf_step_ref
does the same, but updates w
by reference rather than
returning. ipf_step_f
returns a multiplicator: adjusted weights divided by unadjusted weights. combine_factors
is
designed to make ipf_step
work with contingency tables produced by xtabs.
############# one-dimensional ipu ############## ## create random data nobs <- 10 classLabels <- letters[1:3] dat = data.frame( weight = exp(rnorm(nobs)), household = factor(sample(classLabels, nobs, replace = TRUE)) ) dat ## create targets (same lenght as classLabels!) targets <- 3:5 ## calculate weights new_weight <- ipf_step(dat$weight, dat$household, targets) cbind(dat, new_weight) ## check solution xtabs(new_weight ~ dat$household) ## calculate weights "by reference" ipf_step_ref(dat$weight, dat$household, targets) dat ############# multidimensional ipu ############## ## load data factors <- c("time", "sex", "smoker", "day") tips <- data.frame(sex=c("Female","Male","Male"), day=c("Sun","Mon","Tue"), time=c("Dinner","Lunch","Lunch"), smoker=c("No","Yes","No")) tips <- tips[factors] ## combine factors con <- xtabs(~., tips) cf <- combine_factors(tips, con) cbind(tips, cf)[sample(nrow(tips), 10, replace = TRUE),] ## adjust weights weight <- rnorm(nrow(tips)) + 5 adjusted_weight <- ipf_step(weight, cf, con) ## check outputs con2 <- xtabs(adjusted_weight ~ ., data = tips) sum((con - con2)^2)
############# one-dimensional ipu ############## ## create random data nobs <- 10 classLabels <- letters[1:3] dat = data.frame( weight = exp(rnorm(nobs)), household = factor(sample(classLabels, nobs, replace = TRUE)) ) dat ## create targets (same lenght as classLabels!) targets <- 3:5 ## calculate weights new_weight <- ipf_step(dat$weight, dat$household, targets) cbind(dat, new_weight) ## check solution xtabs(new_weight ~ dat$household) ## calculate weights "by reference" ipf_step_ref(dat$weight, dat$household, targets) dat ############# multidimensional ipu ############## ## load data factors <- c("time", "sex", "smoker", "day") tips <- data.frame(sex=c("Female","Male","Male"), day=c("Sun","Mon","Tue"), time=c("Dinner","Lunch","Lunch"), smoker=c("No","Yes","No")) tips <- tips[factors] ## combine factors con <- xtabs(~., tips) cf <- combine_factors(tips, con) cbind(tips, cf)[sample(nrow(tips), 10, replace = TRUE),] ## adjust weights weight <- rnorm(nrow(tips)) + 5 adjusted_weight <- ipf_step(weight, cf, con) ## check outputs con2 <- xtabs(adjusted_weight ~ ., data = tips) sum((con - con2)^2)
Compute the design effect due to unequal weighting.
kishFactor(w, na.rm = FALSE)
kishFactor(w, na.rm = FALSE)
w |
a numeric vector with weights |
na.rm |
a logical value indicating whether NA values should be stripped before the computation proceeds. |
The factor is computed acording to 'Weighting for Unequal P_i', Leslie Kish, Journal of Official Statistics, Vol. 8. No. 2, 1992
The function will return the the kish factor
Alexander Kowarik
kishFactor(rep(1,10)) kishFactor(rlnorm(10))
kishFactor(rep(1,10)) kishFactor(rlnorm(10))
Plot results of calc.stError()
## S3 method for class 'surveysd' plot( x, variable = x$param$var[1], type = c("summary", "grouping"), groups = NULL, sd.type = c("dot", "ribbon"), ... )
## S3 method for class 'surveysd' plot( x, variable = x$param$var[1], type = c("summary", "grouping"), groups = NULL, sd.type = c("dot", "ribbon"), ... )
x |
object of class 'surveysd' output of function calc.stError |
variable |
Name of the variable for which standard errors have been
calcualated in |
type |
can bei either |
groups |
If |
sd.type |
can bei either |
... |
additional arguments supplied to plot. |
library(surveysd) set.seed(1234) eusilc <- demo.eusilc(n = 3, prettyNames = TRUE) dat_boot <- draw.bootstrap(eusilc, REP = 3, hid = "hid", weights = "pWeight", strata = "region", period = "year") # calibrate weight for bootstrap replicates dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region") # estimate weightedRatio for povmd60 per period group <- list("gender", "region", c("gender", "region")) err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = group , period.mean = NULL) plot(err.est) # plot results for gender # dotted line is the result on the national level plot(err.est, type = "grouping", groups = "gender") # plot results for rb090 in each db040 # with standard errors as ribbons plot(err.est, type = "grouping", groups = c("gender", "region"), sd.type = "ribbon")
library(surveysd) set.seed(1234) eusilc <- demo.eusilc(n = 3, prettyNames = TRUE) dat_boot <- draw.bootstrap(eusilc, REP = 3, hid = "hid", weights = "pWeight", strata = "region", period = "year") # calibrate weight for bootstrap replicates dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region") # estimate weightedRatio for povmd60 per period group <- list("gender", "region", c("gender", "region")) err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = group , period.mean = NULL) plot(err.est) # plot results for gender # dotted line is the result on the national level plot(err.est, type = "grouping", groups = "gender") # plot results for rb090 in each db040 # with standard errors as ribbons plot(err.est, type = "grouping", groups = c("gender", "region"), sd.type = "ribbon")
Predefined functions for weighted point estimates in package surveysd
.
weightedRatio(x, w) weightedSum(x, w)
weightedRatio(x, w) weightedSum(x, w)
x |
numeric vector |
w |
weight vector |
Predefined functions are weighted ratio and weighted sum.
Each of the functions return a single numeric value
x <- 1:10 w <- 10:1 weightedRatio(x,w) x <- 1:10 w <- 10:1 weightedSum(x,w)
x <- 1:10 w <- 10:1 weightedRatio(x,w) x <- 1:10 w <- 10:1 weightedSum(x,w)
Prints the results of a call to calc.stError. Shows used variables and function, number of point estiamtes as well as properties of the results.
## S3 method for class 'surveysd' print(x, ...)
## S3 method for class 'surveysd' print(x, ...)
x |
an object of class |
... |
additonal parameters |
Calibrate weights for bootstrap replicates by using iterative proportional updating to match population totals on various household and personal levels.
recalib( dat, hid = attr(dat, "hid"), weights = attr(dat, "weights"), b.rep = attr(dat, "b.rep"), period = attr(dat, "period"), conP.var = NULL, conH.var = NULL, conP = NULL, conH = NULL, epsP = 0.01, epsH = 0.02, ... )
recalib( dat, hid = attr(dat, "hid"), weights = attr(dat, "weights"), b.rep = attr(dat, "b.rep"), period = attr(dat, "period"), conP.var = NULL, conH.var = NULL, conP = NULL, conH = NULL, epsP = 0.01, epsH = 0.02, ... )
dat |
either data.frame or data.table containing the sample survey for various periods. |
hid |
character specifying the name of the column in |
weights |
character specifying the name of the column in |
b.rep |
character specifying the names of the columns in |
period |
character specifying the name of the column in |
conP.var |
character vector containig person-specific variables to which
weights should be calibrated or a list of such character vectors.
Contingency tables for the population are calculated per |
conH.var |
character vector containig household-specific variables to
which weights should be calibrated or a list of such character vectors.
Contingency tables for the population are calculated per |
conP |
list or (partly) named list defining the constraints on person
level. The list elements are contingency tables in array representation
with dimnames corresponding to the names of the relevant calibration
variables in |
conH |
list or (partly) named list defining the constraints on
household level. The list elements are contingency tables in array
representation with dimnames corresponding to the names of the relevant
calibration variables in |
epsP |
numeric value specifying the convergence limit for |
epsH |
numeric value specifying the convergence limit for |
... |
additional arguments passed on to function |
recalib
takes survey data (dat
) containing the bootstrap replicates
generated by draw.bootstrap and calibrates weights for each bootstrap
replication according to population totals for person- or household-specific
variables. dat
must be household data where household members correspond to multiple
rows with the same household identifier. The data should at least containt
the following columns:
Column indicating the sample period;
Column indicating the household ID;
Column containing the household sample weights;
Columns which contain the bootstrap replicates (see output of draw.bootstrap);
Columns indicating person- or household-specific variables for which sample weight should be adjusted.
For each period and each variable in conP.var
and/or conH.var
contingency
tables are estimated to get margin totals on personal- and/or
household-specific variables in the population.
Afterwards the bootstrap replicates are multiplied with the original sample
weight and the resulting product ist then adjusted using ipf()
to match the
previously calcualted contingency tables. In this process the columns of the
bootstrap replicates are overwritten by the calibrated weights.
Returns a data.table containing the survey data as well as the calibrated weights for the bootstrap replicates. The original bootstrap replicates are overwritten by the calibrated weights. If calibration of a bootstrap replicate does not converge the bootsrap weight is not returned and numeration of the returned bootstrap weights is reduced by one.
Johannes Gussenbauer, Alexander Kowarik, Statistics Austria
ipf()
for more information on iterative
proportional fitting.
library(surveysd) library(data.table) setDTthreads(1) set.seed(1234) eusilc <- demo.eusilc(n = 3, prettyNames = TRUE) dat_boot <- draw.bootstrap(eusilc, REP = 1, hid = "hid", weights = "pWeight", strata = "region", period = "year") # calibrate weight for bootstrap replicates dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region", verbose = TRUE) # calibrate on other variables dat_boot_calib <- recalib(dat_boot, conP.var = c("gender", "age"), conH.var = c("region", "hsize"), verbose = TRUE) # supply contingency tables directly conP1 <- xtabs(pWeight ~ age + year, data = eusilc) conP2 <- xtabs(pWeight ~ gender + year, data = eusilc) conH1 <- xtabs(pWeight ~ region + year, data = eusilc[!duplicated(paste(hid,year))]) conH2 <- xtabs(pWeight ~ hsize + year, data = eusilc[!duplicated(paste(hid,year))]) conP <- list(conP1,conP2) conH <- list(conH1,conH2) dat_boot_calib <- recalib(dat_boot, conP.var = NULL, conH.var = NULL, conP = conP, conH = conH, verbose = TRUE) # calibrate on gender x age dat_boot_calib <- recalib(dat_boot, conP.var = list(c("gender", "age")), conH.var = NULL, verbose = TRUE) # identical conP1 <- xtabs(pWeight ~ age + gender + year, data = eusilc) conP <- list(conP1) dat_boot_calib <- recalib(dat_boot, conP.var = NULL, conH.var = NULL, conP = conP, conH = NULL, verbose = TRUE)
library(surveysd) library(data.table) setDTthreads(1) set.seed(1234) eusilc <- demo.eusilc(n = 3, prettyNames = TRUE) dat_boot <- draw.bootstrap(eusilc, REP = 1, hid = "hid", weights = "pWeight", strata = "region", period = "year") # calibrate weight for bootstrap replicates dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region", verbose = TRUE) # calibrate on other variables dat_boot_calib <- recalib(dat_boot, conP.var = c("gender", "age"), conH.var = c("region", "hsize"), verbose = TRUE) # supply contingency tables directly conP1 <- xtabs(pWeight ~ age + year, data = eusilc) conP2 <- xtabs(pWeight ~ gender + year, data = eusilc) conH1 <- xtabs(pWeight ~ region + year, data = eusilc[!duplicated(paste(hid,year))]) conH2 <- xtabs(pWeight ~ hsize + year, data = eusilc[!duplicated(paste(hid,year))]) conP <- list(conP1,conP2) conH <- list(conH1,conH2) dat_boot_calib <- recalib(dat_boot, conP.var = NULL, conH.var = NULL, conP = conP, conH = conH, verbose = TRUE) # calibrate on gender x age dat_boot_calib <- recalib(dat_boot, conP.var = list(c("gender", "age")), conH.var = NULL, verbose = TRUE) # identical conP1 <- xtabs(pWeight ~ age + gender + year, data = eusilc) conP <- list(conP1) dat_boot_calib <- recalib(dat_boot, conP.var = NULL, conH.var = NULL, conP = conP, conH = NULL, verbose = TRUE)
Draw bootstrap replicates from survey data using the rescaled bootstrap for stratified multistage sampling, presented by Preston, J. (2009).
rescaled.bootstrap( dat, REP = 1000, strata = "DB050>1", cluster = "DB060>DB030", fpc = "N.cluster>N.households", single.PSU = c("merge", "mean"), return.value = c("data", "replicates"), check.input = TRUE, period = NULL )
rescaled.bootstrap( dat, REP = 1000, strata = "DB050>1", cluster = "DB060>DB030", fpc = "N.cluster>N.households", single.PSU = c("merge", "mean"), return.value = c("data", "replicates"), check.input = TRUE, period = NULL )
dat |
either data frame or data table containing the survey sample |
REP |
integer indicating the number of bootstraps to be drawn |
strata |
string specifying the column name in |
cluster |
string specifying the column name in |
fpc |
string specifying the column name in |
single.PSU |
either "merge" or "mean" defining how single PSUs need to
be dealt with. For |
return.value |
either "data" or "replicates" specifying the return value
of the function. For "data" the survey data is returned as class
|
check.input |
logical, if TRUE the input will be checked before applying the bootstrap procedure |
period |
optional character specifying the name of the column in |
For specifying multistage sampling designs the column names in
strata
,cluster
and fpc
need to be seperated by ">".
For multistage sampling the strings are read from left to right meaning that
the column name before the first ">" is taken as the column for
stratification/clustering/number of PSUs at the first and the column after
the last ">" is taken as the column for stratification/clustering/number of
PSUs at the last stage.
If for some stages the sample was not stratified or clustered one must
specify this by "1" or "I", e.g. strata=c("strata1>I>strata3")
if there was
no stratification at the second stage or cluster=c("cluster1>cluster2>I")
if there were no clusters at the last stage.
The number of PSUs at each stage is not calculated internally and must be
specified for any sampling design.
For single stage sampling using stratification this can usually be done by
adding over all sample weights of each PSU by each strata-code.
Spaces in each of the strings will be removed, so if column names contain
spaces they should be renamed before calling this procedure!
If period
is supplied the sampling of bootstrap replicates for each period
will follow the method proposed by Preston, J. (2009) with the condition
that records drawn at the previous period are automatically selected.
This can result in more than halve of the records selected
for a specific period
, strata
and cluster
. In that case
records will be de-selected such that floor(n/2)
records, with n
as the
total number of records, are selected for each period
, strata
and cluster
.
If NULL
(the default), it is assumed that all observations belong#
to the same period and this parameter will be ignored.
returns the complete data set including the bootstrap replicates or
just the bootstrap replicates, depending on return.value="data"
or
return.value="replicates"
respectively.
Johannes Gussenbauer, Statistics Austria
Preston, J. (2009). Rescaled bootstrap for stratified multistage sampling. Survey Methodology. 35. 227-234.
library(surveysd) library(data.table) setDTthreads(1) set.seed(1234) eusilc <- demo.eusilc(n = 1,prettyNames = TRUE) eusilc[,N.households:=uniqueN(hid),by=region] eusilc.bootstrap <- rescaled.bootstrap(eusilc,REP=10,strata="region", cluster="hid",fpc="N.households") eusilc[,new_strata:=paste(region,hsize,sep="_")] eusilc[,N.housholds:=uniqueN(hid),by=new_strata] eusilc.bootstrap <- rescaled.bootstrap(eusilc,REP=10,strata=c("new_strata"), cluster="hid",fpc="N.households")
library(surveysd) library(data.table) setDTthreads(1) set.seed(1234) eusilc <- demo.eusilc(n = 1,prettyNames = TRUE) eusilc[,N.households:=uniqueN(hid),by=region] eusilc.bootstrap <- rescaled.bootstrap(eusilc,REP=10,strata="region", cluster="hid",fpc="N.households") eusilc[,new_strata:=paste(region,hsize,sep="_")] eusilc[,N.housholds:=uniqueN(hid),by=new_strata] eusilc.bootstrap <- rescaled.bootstrap(eusilc,REP=10,strata=c("new_strata"), cluster="hid",fpc="N.households")
Generate summary output for a ipf calibration
## S3 method for class 'ipf' summary(object, ...)
## S3 method for class 'ipf' summary(object, ...)
... |
additional arguments |
x |
object of class ipf |
a list of the following outputs
Laura Gruber
## Not run: # load data eusilc <- demo.eusilc(n = 1, prettyNames = TRUE) # personal constraints conP1 <- xtabs(pWeight ~ age, data = eusilc) conP2 <- xtabs(pWeight ~ gender + region, data = eusilc) conP3 <- xtabs(pWeight*eqIncome ~ gender, data = eusilc) # household constraints conH1 <- xtabs(pWeight ~ hsize + region, data = eusilc) # simple usage ------------------------------------------ calibweights1 <- ipf( eusilc, conP = list(conP1, conP2, eqIncome = conP3), bound = NULL, verbose = TRUE ) output <- summary(calibweights1) # the output can easily be exported to an Excel file, e.g. with # library(openxlsx) # write.xlsx(output, "SummaryIPF.xlsx") ## End(Not run)
## Not run: # load data eusilc <- demo.eusilc(n = 1, prettyNames = TRUE) # personal constraints conP1 <- xtabs(pWeight ~ age, data = eusilc) conP2 <- xtabs(pWeight ~ gender + region, data = eusilc) conP3 <- xtabs(pWeight*eqIncome ~ gender, data = eusilc) # household constraints conH1 <- xtabs(pWeight ~ hsize + region, data = eusilc) # simple usage ------------------------------------------ calibweights1 <- ipf( eusilc, conP = list(conP1, conP2, eqIncome = conP3), bound = NULL, verbose = TRUE ) output <- summary(calibweights1) # the output can easily be exported to an Excel file, e.g. with # library(openxlsx) # write.xlsx(output, "SummaryIPF.xlsx") ## End(Not run)