| Title: | Constrained Regression for Survey Data |
|---|---|
| Description: | Domain mean estimation with monotonicity or block monotone constraints. See Xu X, Meyer MC and Opsomer JD (2021)<doi:10.1016/j.jspi.2021.02.004> for more details. |
| Authors: | Xiyue Liao [aut, cre] |
| Maintainer: | Xiyue Liao <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.15 |
| Built: | 2026-06-03 08:53:06 UTC |
| Source: | https://github.com/xliaosdsu/csurvey |
A symbolic routine to define that a vector of domain means follows a monotonic ordering in a predictor in a formula argument to csvy. This is the unsmoothed version.
block.Ord(x, order = NULL, numknots = 0, knots = 0, space = "E")block.Ord(x, order = NULL, numknots = 0, knots = 0, space = "E")
x |
A numeric predictor which has the same length as the response vector. |
order |
A |
numknots |
The number of knots used to smoothly constrain a predictor. The value should be |
knots |
The knots used to smoothly constrain a predictor. The value should be |
space |
A character specifying the method to create knots. It will not be used for a shape-restricted predictor without smoothing. The default value is "E". |
The vector x with five attributes, i.e., name: the name of x; shape: 9("block ordering"); numknots: the numknots argument in "block.Ord"; knots: the knots argument in "block.Ord"; space: the space argument in "block.Ord".
Xiyue Liao
The csvy function performs design-based domain mean estimation with monotonicity and block-monotone shape constraints.
For example, in a one dimensional situation, we assume that are non-decreasing over domains. If this monotonicity is not used in estimation,
the population domain means can be estimated by the Horvitz-Thompson estimator or the Hajek estimator. To use the monotonicity information, this csvy function starts from the
Hajek estimates and the isotonic estimator minimizes the weighted sum of squared deviations from the sample domain means over the set of ordered vectors; that is, is the minimizer of subject to , where is the diagonal matrix with elements , and and is a constraint matrix imposing the monotonicity constraint.
Domains can also be formed from multiple covariates. In that case, a grid will be used to represent the domains. For example, if there are two predictors and , and has values on domains: , has values on domains: , then the domains formed by and will be a by grid.
To get approximate confidence intervals or surfaces for the domain means, we apply the method in Meyer, M. C. (2018). is the estimated probability that the projection of onto lands on , and the values are obtained by simulating many normal random vectors with estimated domain means and covariance matrix , where is a matrix, and recording the resulting sets .
The user needs to provide a survey design, which is specified by the svydesign function in the survey package, and also a data frame containing the response, predictor(s), domain variable, sampling weights, etc. So far, only stratified sampling design with simple random sampling without replacement (STSI) is considered in the examples in this package.
Note that when there is any empty domain, the user must specify the total number of domains in the argument.
csvy(formula, design, family=stats::gaussian(), multicore=getOption("csurvey.multicore"), level=0.95, n.mix=100L, test=FALSE, subset=NULL) ## S3 method for class 'csvy' summary(object,...) ## S3 method for class 'csvy' vcov(object,...) ## S3 method for class 'csvy' coef(object,...) ## S3 method for class 'csvy' confint(object, parm=NULL, level = 0.95, type = c("link", "response"),...) ## S3 method for class 'csvy' predict(object, newdata = NULL, type = c("link", "response"), se.fit = TRUE, level = 0.95, n.mix = 100,...)csvy(formula, design, family=stats::gaussian(), multicore=getOption("csurvey.multicore"), level=0.95, n.mix=100L, test=FALSE, subset=NULL) ## S3 method for class 'csvy' summary(object,...) ## S3 method for class 'csvy' vcov(object,...) ## S3 method for class 'csvy' coef(object,...) ## S3 method for class 'csvy' confint(object, parm=NULL, level = 0.95, type = c("link", "response"),...) ## S3 method for class 'csvy' predict(object, newdata = NULL, type = c("link", "response"), se.fit = TRUE, level = 0.95, n.mix = 100,...)
formula |
A formula object which gives a symbolic description of the model to be fitted. It has the form "response ~ predictor". The response is a vector of length Assume that
|
design |
A survey design, which must be specified by the svydesign routine in the survey package. |
family |
A parameter indicating the error distribution and link function to be used in the model. It can be a character string naming a family function or the result of a call to a family function. This is borrowed from the glm routine in the stats package. There are four families used in csvy: Gaussian, binomial, poisson, and Gamma. |
multicore |
A parameter retrieving the current global option for "csurvey.multicore" and assigns its value to the multicore variable, allowing the function to respect a user-defined setting for parallel processing behavior across the package. |
level |
Confidence level of the approximate confidence surfaces. The default is 0.95. |
n.mix |
The number of simulations used to get the approximate confidence intervals or surfaces. If n.mix = 0, no simulation will be done and the face of the final projection will be used to compute the covariance matrix of the constrained estimate. The default is n.mix = 100L. |
test |
A logical scalar. If test == TRUE, then the p-value for the test |
subset |
Expression to select a subpopulation. |
... |
Extra arguments. |
The coef function returns estimated systematic component of a csvy object.
The confint function returns the confidence interval of a csvy object. If type = "response", then the interval is for the mean; if type = "link", then the interval is for the systematic component.
parm |
An argument in the generic confint function in the stats package. For now, this argument is not in use. |
The following arguments are used in the predict function.
object |
A csvy object. |
newdata |
A data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
type |
If the response is Gaussian, type = "response" and type = "link" give the predicted mean; if the response is binomial, poisson or Gamma, type = "response" gives the predicted mean, and type = "link" gives the predicted systematic component. |
se.fit |
Logical switch indicating if confidence intervals are required. |
For binomial and Poisson families use family=quasibinomial()
and family=quasipoisson() to avoid a warning about non-integer
numbers of successes. The ‘quasi’ versions of the family objects give
the same point estimates and standard errors and do not give the
warning.
predict gives fitted values and sampling variability for specific new
values of covariates. When newdata are the population mean it
gives the regression estimator of the mean, and when newdata are
the population totals and total is specified it gives the
regression estimator of the population total. Regression estimators of
mean and total can also be obtained with calibrate.
The output is a list of values used for estimation, inference and visualization. Main output include:
survey.design |
The survey design used in the model. |
etahat |
Estimated shape-constrained domain systematic component. |
etahatu |
Estimated unconstrained domain systematic component. |
muhat |
Estimated shape-constrained domain means. |
muhatu |
Estimated unconstrained domain means. |
lwr |
Approximate lower confidence band or surface for the shape-constrained domain mean estimate. |
upp |
Approximate upper confidence band or surface for the shape-constrained domain mean estimate. |
lwru |
Approximate lower confidence band or surface for the unconstrained domain mean estimate. |
uppu |
Approximate upper confidence band or surface for the unconstrained domain mean estimate. |
amat |
The |
grid |
A |
nd |
A vector of sample sizes in all domains. |
Ds |
A vector of the number of domains in each dimension. |
acov |
Constrained mixture covariance estimate of domain means. |
cov.un |
Unconstrained covariance estimate of domain means. |
CIC |
The cone information criterion proposed in Meyer(2013a). It uses the "null expected degrees of freedom" as a measure of the complexity of the model. See Meyer(2013a) for further details of cic. |
CIC.un |
The cone information criterion for the unconstrained estimator. |
zeros_ps |
Index of empty domain(s). |
nd |
Sample size of each domain. |
pval |
p-value of the one-sided test. |
family |
The family parameter defined in a csvy formula. |
df.residual |
The observed degree of freedom for the residuals of a csvy fit. |
df.null |
The degree of freedom for the null model of a csvy fit. |
domain |
Index of each domain in the data set contained in the survey.design object. |
null.deviance |
The deviance for the null model of a csvy fit. |
deviance |
The residual deviance of a csvy fit. |
Xiyue Liao
Xu, X. and Meyer, M. C. (2021) One-sided testing of population domain means in surveys.
Oliva, C., Meyer, M. C., and Opsomer, J.D. (2020) Estimation and inference of domain means subject to qualitative constraints. Survey Methodology
Meyer, M. C. (2018) A Framework for Estimation and Inference in Generalized Additive Models with Shape and Order Restrictions. Statistical Science 33(4) 595–614.
Wu, J., Opsomer, J.D., and Meyer, M. C. (2016) Survey estimation of domain means that respect natural orderings. Canadian Journal of Statistics 44(4) 431–444.
Meyer, M. C. (2013a) Semi-parametric additive constrained regression. Journal of Nonparametric Statistics 25(3), 715.
Lumley, T. (2004) Analysis of complex survey samples. Journal of Statistical Software 9(1) 1–19.
plotpersp, to create a 3D Plot for a csvy Object
incr, to specify an increasing shape-restriction in a csvy Formula
decr, to specify an decreasing shape-restriction in a csvy Formula
# Example 1: monotonic in 1 predictor data(nhdat2, package = 'csurvey') #specify the design: dstrat <- svydesign(ids = ~id, strata = ~str, data = nhdat2, weight = ~wt) #uncomment to use parallel computing: #options(csurvey.multicore=TRUE) #mixture-variance-covariance matrix is simulated set.seed(1) ans <- csvy(chol ~ incr(age), design = dstrat, n.mix=5) #check the constrained fit vs the unconstrained fit summary(ans) plot(ans, type = 'both') ## Not run: # Example 2: monotonic in 2 predictors and unconstrained in a 3rd predictor data(nhdat2, package = 'csurvey') #specify the design: dstrat <- svydesign(ids = ~id, strata = ~str, data = nhdat2, weight = ~wt) #use parallel computing: #options(csurvey.multicore=TRUE) #mixture-variance-covariance matrix is simulated #Average cholestorel level increases in age, waist, and unconstrained in income set.seed(1) ans <- csvy(chol ~ incr(age)*incr(wcat)*icat, design = dstrat, test=FALSE, n.mix=5) #visualize the constrained estimation with confidence bands plot(ans, x1='icat', x2='wcat') #create control object ctl <- plot_csvy_control(ribbon_fill = "pink",x1lab = 'income',x2lab = 'waist') plot(ans, x1='icat', x2='wcat', control=ctl) ## End(Not run) # Example 3: example with a binomial response library(NHANES) library(survey) data(NHANES) nh <- subset(NHANES, !is.na(Education) & !is.na(BMI) & !is.na(Weight)) nh$DiabetesBin <- as.integer(nh$Diabetes == "Yes") nh$BMIgroup <- cut(nh$BMI, breaks = c(0, 18.5, 25, 30, 35, 40, Inf), labels = FALSE) # specify the design dsgn <- svydesign(ids = ~1, strata = ~BMIgroup, weights = ~Weight, data = nh) ans <- csvy(DiabetesBin ~ decr(Education) * incr(BMIgroup), design = dsgn, family = quasibinomial(link='logit'), n.mix=5) summary(ans) plot(ans, x1 = 'BMIgroup', x2 = 'Education') ctl <- plot_csvy_control( x1size = 1.5, x2size = 2, angle = 45, hjust = .3) plot(ans, x1 = 'Education', x2 = 'BMIgroup', control = ctl)# Example 1: monotonic in 1 predictor data(nhdat2, package = 'csurvey') #specify the design: dstrat <- svydesign(ids = ~id, strata = ~str, data = nhdat2, weight = ~wt) #uncomment to use parallel computing: #options(csurvey.multicore=TRUE) #mixture-variance-covariance matrix is simulated set.seed(1) ans <- csvy(chol ~ incr(age), design = dstrat, n.mix=5) #check the constrained fit vs the unconstrained fit summary(ans) plot(ans, type = 'both') ## Not run: # Example 2: monotonic in 2 predictors and unconstrained in a 3rd predictor data(nhdat2, package = 'csurvey') #specify the design: dstrat <- svydesign(ids = ~id, strata = ~str, data = nhdat2, weight = ~wt) #use parallel computing: #options(csurvey.multicore=TRUE) #mixture-variance-covariance matrix is simulated #Average cholestorel level increases in age, waist, and unconstrained in income set.seed(1) ans <- csvy(chol ~ incr(age)*incr(wcat)*icat, design = dstrat, test=FALSE, n.mix=5) #visualize the constrained estimation with confidence bands plot(ans, x1='icat', x2='wcat') #create control object ctl <- plot_csvy_control(ribbon_fill = "pink",x1lab = 'income',x2lab = 'waist') plot(ans, x1='icat', x2='wcat', control=ctl) ## End(Not run) # Example 3: example with a binomial response library(NHANES) library(survey) data(NHANES) nh <- subset(NHANES, !is.na(Education) & !is.na(BMI) & !is.na(Weight)) nh$DiabetesBin <- as.integer(nh$Diabetes == "Yes") nh$BMIgroup <- cut(nh$BMI, breaks = c(0, 18.5, 25, 30, 35, 40, Inf), labels = FALSE) # specify the design dsgn <- svydesign(ids = ~1, strata = ~BMIgroup, weights = ~Weight, data = nh) ans <- csvy(DiabetesBin ~ decr(Education) * incr(BMIgroup), design = dsgn, family = quasibinomial(link='logit'), n.mix=5) summary(ans) plot(ans, x1 = 'BMIgroup', x2 = 'Education') ctl <- plot_csvy_control( x1size = 1.5, x2size = 2, angle = 45, hjust = .3) plot(ans, x1 = 'Education', x2 = 'BMIgroup', control = ctl)
A structured subset of the 2009 to 2010 NHANES data designed for illustrating constrained survey estimation methods.
data(nhdat)data(nhdat)
A data frame with 1,680 observations on the following 7 variables:
idCluster identifier derived from NHANES sequence number (SEQN).
cholBinary indicator of high total cholesterol: 1 if total cholesterol > 200 mg/dL, 0 otherwise. Derived from LBXTC in TCHOL_F.XPT.
wcatFour-level ordinal variable for waist to height ratio category, based on BMXWAIST and BMXHT from BMX_F.XPT.
genderGender of the participant (1 = male, 2 = female), from RIAGENDR in DEMO_F.XPT.
ageAge in years (continuous), from RIDAGEYR in DEMO_F.XPT.
wtSampling weight within strata, based on WTINT2YR from DEMO_G.XPT.
strStratum identifier, based on SDMVPSU from DEMO_G.XPT.
This subset includes participants aged 21 to 45 years, selected for illustrating the estimation of the probability of high cholesterol using order constrained survey methods.
National Center for Health Statistics. NHANES 2009 to 2010 Public Use Data Files.\ https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?BeginYear=2009
A structured subset of the 2009 to 2010 NHANES data designed for illustrating constrained survey estimation methods.
data(nhdat2)data(nhdat2)
A data frame with 1,933 observations and the following 8 variables:
idAn identification vector specifying cluster ids from largest level to smallest level, derived from NHANES sequence number SEQN.
cholTotal cholesterol, measured in mg/dL. This variable is derived from LBXTC in the laboratory file TCHOL_F.XPT.
wcatA 4 level ordinal categorical variable representing waist to height ratio categories, computed from BMXWAIST (waist circumference in cm) and BMXHT (height in cm) in the body measures file BMX_F.XPT.
icatA 4 level ordinal categorical variable. It categorizes income based on the ratio of family income to the federal poverty level (INDFMPIR),
with cutpoints at 0.75, 1.38, and 3.5 to reflect meaningful policy thresholds. It is derived from INDFMIN2 in the demographics file DEMO_F.XPT.
genderGender of the participant, with values 1 (male) and 2 (female), derived from RIAGENDR in DEMO_F.XPT.
ageAge in years (continuous), derived from RIDAGEYR in the demographics file DEMO_F.XPT.
wtSampling weight within each stratum, derived from (WTINT2YR) from DEMO_G.XPT.
strStratum identifier, derived from (SDMVPSU) from DEMO_G.XPT.
This subset includes participants aged 21 through 45, selected to demonstrate estimation of domain means using order constrained methods.
National Center for Health Statistics. NHANES 2009 to 2010 Public Use Data Files. https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?BeginYear=2009
Creates a list of graphical options to customize plots generated by plot.csvy.
This includes labels, text sizes, colors, shapes, themes, and other display features.
plot_csvy_control( x1lab = NULL, x1_labels = TRUE, x2lab = NULL, x2_labels = TRUE, x3lab = NULL, x3_labels = TRUE, x4_vals = NULL, x4_labels = NULL, ynm = NULL, ci = TRUE, legend = TRUE, ylab = TRUE, x1size = 3.8, x2size = 3.8, constrained_color = "cornflowerblue", unconstrained_color = "#A3C99A", constrained_shape = 16, unconstrained_shape = 18, ribbon_fill = "lightblue", line_color = "black", base_theme = ggplot2::theme_minimal(), subtitle.size = 12, angle = 0, hjust = 0.1 )plot_csvy_control( x1lab = NULL, x1_labels = TRUE, x2lab = NULL, x2_labels = TRUE, x3lab = NULL, x3_labels = TRUE, x4_vals = NULL, x4_labels = NULL, ynm = NULL, ci = TRUE, legend = TRUE, ylab = TRUE, x1size = 3.8, x2size = 3.8, constrained_color = "cornflowerblue", unconstrained_color = "#A3C99A", constrained_shape = 16, unconstrained_shape = 18, ribbon_fill = "lightblue", line_color = "black", base_theme = ggplot2::theme_minimal(), subtitle.size = 12, angle = 0, hjust = 0.1 )
x1lab |
Character. Label for the first covariate (x-axis). Default is |
x1_labels |
Logical or Character vector. Custom tick labels for the first covariate. Default is |
x2lab |
Character. Label for the second covariate (x-axis). Default is |
x2_labels |
Logical or Character vector. Custom tick labels for the second covariate. Default is |
x3lab |
Character. Label for the third covariate, if used (for subtitles or grouping). Default is |
x3_labels |
Logical or Character vector. Custom labels for the third covariate. Default is |
x4_vals |
Character vector. For models with more than three predictors, specifies the category to use for each additional predictor. Defaults to NULL, using the mode of each. |
x4_labels |
Character vector. Custom labels for the fourth covariate. Default is |
ynm |
Character. Label for the response. Default is |
ci |
Logical. If TRUE, confidence bands are displayed. Defaults to TRUE. |
legend |
Logical. If TRUE, legend for constrained fit or unconstrained fit will be shown. Defaults to TRUE. |
ylab |
Logical. If TRUE, the response name will be shown on the y-axis. Defaults to TRUE. |
x1size |
Numeric. Font size for annotation labels on the x1 axis. Default is |
x2size |
Numeric. Font size for annotation labels on the x2 axis. Default is |
constrained_color |
Character. Color used to display fitted values and intervals from the constrained model. Default is |
unconstrained_color |
Character. Color used to display fitted values and intervals from the unconstrained model. Default is |
constrained_shape |
Integer. Shape code (used by |
unconstrained_shape |
Integer. Shape code for points from unconstrained fits. Default is |
ribbon_fill |
Character. Fill color for the confidence ribbon around the fitted lines. Default is |
line_color |
Character. Color of the lines connecting the fitted values. Default is |
base_theme |
A |
subtitle.size |
Numeric. Font size for the subtitle text in the plot. Default is |
angle |
Numeric. Angle (in degrees) to rotate x-axis labels (typically for x1). Default is |
hjust |
Numeric. Horizontal justification for rotated x-axis labels. Default is |
A named list of graphical control parameters to be passed to the control argument in plot.csvy.
data(nhdat2, package = 'csurvey') dstrat <- svydesign(ids = ~id, strata = ~str, data = nhdat2, weight = ~wt) ans <- csvy(chol ~ incr(age), design = dstrat, n.mix=5) #check the constrained fit vs the unconstrained fit with default aesthetics ctl <- plot_csvy_control() plot(ans, type = 'both', control = ctl) #check the fit with user-defined aethetics ctl <- list(x1lab = "Age Group", x2lab = "Region", constrained_color = "cornflowerblue", unconstrained_color = "gray80", x1size = 4.5) plot(ans, type = 'both', control = ctl)data(nhdat2, package = 'csurvey') dstrat <- svydesign(ids = ~id, strata = ~str, data = nhdat2, weight = ~wt) ans <- csvy(chol ~ incr(age), design = dstrat, n.mix=5) #check the constrained fit vs the unconstrained fit with default aesthetics ctl <- plot_csvy_control() plot(ans, type = 'both', control = ctl) #check the fit with user-defined aethetics ctl <- list(x1lab = "Age Group", x2lab = "Region", constrained_color = "cornflowerblue", unconstrained_color = "gray80", x1size = 4.5) plot(ans, type = 'both', control = ctl)
"csvy" object.
Supports both single-factor and two-factor visualization.
Aesthetic settings can be customized using plot_csvy_control().Plot method for csvy objects
Generates a diagnostic or summary plot from a fitted "csvy" object.
Supports both single-factor and two-factor visualization.
Aesthetic settings can be customized using plot_csvy_control().
## S3 method for class 'csvy' plot( x, x1 = NULL, x2 = NULL, domains = NULL, type = c("constrained", "unconstrained", "both"), control = plot_csvy_control(), ... )## S3 method for class 'csvy' plot( x, x1 = NULL, x2 = NULL, domains = NULL, type = c("constrained", "unconstrained", "both"), control = plot_csvy_control(), ... )
x |
An object of class |
x1 |
Optional. Name of the first factor to display in two-factor plots. Defaults to the first added variable. |
x2 |
Optional. Name of the second factor to display in two-factor plots. Defaults to the second added variable. |
domains |
Optional. A data frame containing some domain(s) to be emphasized on the plot. Defaults to be NULL. |
type |
Character string, either |
control |
A list of display options returned by |
... |
Additional arguments passed to |
A ggplot2 object.
plot_csvy_control for a full list of customizable settings.
# plot.csvy(fit) # plot.csvy(fit, x1 = "education", x2 = "region", control = plot_csvy_control(x1lab = "Education"))# plot.csvy(fit) # plot.csvy(fit, x1 = "education", x2 = "region", control = plot_csvy_control(x1lab = "Education"))
plotpersp.csvy
Constructs a list of control parameters for use with plotpersp.csvy.
It extends the default settings from plotpersp_control with options specific to csvy plots.
plotpersp_csvy_control(surface = c("C", "U"), x1nm = NULL, x2nm = NULL, categ = NULL, col = NULL, random = FALSE, ngrid = 12, xlim = NULL, ylim = NULL, zlim = NULL, xlab = NULL, ylab = NULL, zlab = NULL, th = NULL, ltheta = NULL, main = NULL, sub = NULL, ticktype = "simple", ci = c("none", "lwr", "up", "both"), cex = 1, categnm = NULL, type = c("response", "link"), cex.main = 0.8, box = TRUE, axes = TRUE, nticks = 5, palette = NULL, NCOL = NULL, transpose = FALSE)plotpersp_csvy_control(surface = c("C", "U"), x1nm = NULL, x2nm = NULL, categ = NULL, col = NULL, random = FALSE, ngrid = 12, xlim = NULL, ylim = NULL, zlim = NULL, xlab = NULL, ylab = NULL, zlab = NULL, th = NULL, ltheta = NULL, main = NULL, sub = NULL, ticktype = "simple", ci = c("none", "lwr", "up", "both"), cex = 1, categnm = NULL, type = c("response", "link"), cex.main = 0.8, box = TRUE, axes = TRUE, nticks = 5, palette = NULL, NCOL = NULL, transpose = FALSE)
surface |
Plot the constrained ( |
x1nm, x2nm
|
Character strings naming the predictor variables for x and y axes. |
categ |
Optional character string naming a categorical covariate to stratify plots. |
col |
Color(s) for surfaces. Can be a palette or custom colors. |
random |
If |
ngrid |
Number of grid points along each axis. |
xlim, ylim, zlim
|
Optional limits for x, y, z axes. |
xlab, ylab, zlab
|
Axis labels. |
th, ltheta
|
Viewing and lighting angles for the plot. |
main, sub
|
Plot title and subtitle. |
ticktype |
Type of ticks: |
ci |
Confidence interval display mode: |
cex |
Scaling factor for labels. |
categnm |
Labels for each level of the categorical covariate. |
type |
Scale of the surface: |
cex.main |
Scaling factor for main title text. |
box, axes
|
Logical flags to show box and axes. |
nticks |
Number of tick marks along axes. |
palette |
Vector of colors for multi-surface plots. |
NCOL |
Number of columns in multi-panel layout. |
transpose |
Logical; if |
A named list of graphical settings for use in plotpersp.csvy.
plotpersp.csvy, plotpersp_control, persp
ctrl <- plotpersp_csvy_control(col = "topo", ci = "both", transpose = TRUE)ctrl <- plotpersp_csvy_control(col = "topo", ci = "both", transpose = TRUE)