Title: | Constrained Generalized Additive Model |
---|---|
Description: | A constrained generalized additive model is fitted by the cgam routine. Given a set of predictors, each of which may have a shape or order restrictions, the maximum likelihood estimator for the constrained generalized additive model is found using an iteratively re-weighted cone projection algorithm. The ShapeSelect routine chooses a subset of predictor variables and describes the component relationships with the response. For each predictor, the user needs only specify a set of possible shape or order restrictions. A model selection method chooses the shapes and orderings of the relationships as well as the variables. The cone information criterion (CIC) is used to select the best combination of variables and shapes. A genetic algorithm may be used when the set of possible models is large. In addition, the cgam routine implements a two-dimensional isotonic regression using warped-plane splines without additivity assumptions. It can also fit a convex or concave regression surface with triangle splines without additivity assumptions. See Liao X, Meyer MC (2019)<doi:10.18637/jss.v089.i05> for more details. |
Authors: | Mary Meyer [aut], Xiyue Liao [aut, cre] |
Maintainer: | Xiyue Liao <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.21 |
Built: | 2024-11-15 07:28:13 UTC |
Source: | https://github.com/cran/cgam |
The is a subroutine which only works for the ShapeSelect routine. It returns an object of the cgam class given the variables and their shapes chosen by the ShapeSelect routine.
best.fit(x)
best.fit(x)
x |
x is an object of the ShapeSelect class. |
object |
The best fit returned by the ShapeSelect routine, which is an object of the cgam class. |
Xiyue Liao
## Not run: library(MASS) data(Rubber) # do a variable and shape selection with four possible shapes # increasing, decreasing, convex and concave ans <- ShapeSelect(loss ~ shapes(hard, set = c("incr", "decr", "conv", "conc")) + shapes(tens, set = c("incr", "decr", "conv", "conc")), data = Rubber, genetic = TRUE) # check the best fit, which is an object of the cgam class bf <- best.fit(ans) class(bf) plotpersp(bf) ## End(Not run)
## Not run: library(MASS) data(Rubber) # do a variable and shape selection with four possible shapes # increasing, decreasing, convex and concave ans <- ShapeSelect(loss ~ shapes(hard, set = c("incr", "decr", "conv", "conc")) + shapes(tens, set = c("incr", "decr", "conv", "conc")), data = Rubber, genetic = TRUE) # check the best fit, which is an object of the cgam class bf <- best.fit(ans) class(bf) plotpersp(bf) ## End(Not run)
The partial linear generalized additive model is fitted using the method of maximum likelihood, where shape or order restrictions can be imposed on the non-parametrically modelled predictors with optional smoothing, and no restrictions are imposed on the optional parametrically modelled covariate.
cgam(formula, cic = FALSE, nsim = 100, family = gaussian, cpar = 1.5, data = NULL, weights = NULL, sc_x = FALSE, sc_y = FALSE, pnt = TRUE, pen = 0, var.est = NULL, gcv = FALSE, pvf = TRUE)
cgam(formula, cic = FALSE, nsim = 100, family = gaussian, cpar = 1.5, data = NULL, weights = NULL, sc_x = FALSE, sc_y = FALSE, pnt = TRUE, pen = 0, var.est = NULL, gcv = FALSE, pvf = TRUE)
formula |
A formula object which gives a symbolic description of the model to be fitted. It has the form "response ~ predictor". The response is a vector of length Assume that
|
cic |
Logical flag indicating if or not simulations are used to get the cic value. The default is cic = FALSE |
nsim |
The number of simulations used to get the cic parameter. The default is nsim = 100. |
family |
A parameter indicating the error distribution and link function to be used in the model. It can be a character string naming a family function or the result of a call to a family function. This is borrowed from the glm routine in the stats package. There are four families used in csvy: Gaussian, binomial, poisson, and Gamma. Note that if family = Ord is specified, a proportional odds regression model with shape constraints is fitted. This is under development. |
cpar |
A multiplier to estimate the model variance, which is defined as |
data |
An optional data frame, list or environment containing the variables in the model. The default is data = NULL. |
weights |
An optional non-negative vector of "replicate weights" which has the same length as the response vector. If weights are not given, all weights are taken to equal 1. The default is weights = NULL. |
sc_x |
Logical flag indicating if or not continuous predictors are normalized. The default is sc_x = FALSE. |
sc_y |
Logical flag indicating if or not the response variable is normalized. The default is sc_y = FALSE. |
pen |
User-defined penalty parameter. It must be non-negative. It will only be used in a warped-plane spline fit or a triangle spline fit. The default is pen = 0. |
pnt |
Logical flag indicating if or not penalized constrained regression splines are used. It will only be used in a warped-plane spline fit or a triangle spline fit. The default is pnt = TRUE. |
var.est |
To do a monotonic variance function estimation, the user can set var.est = s.incr(x) or var.est = s.decr(x). See |
gcv |
Logical flag indicating if or not gcv is used to choose a penalty term in warped-plane surface fit. The default is gcv = FALSE. |
pvf |
Logical flag indicating if or not simulations are used to find the p-value of the test of linear vs double monotone in warped plane surface fit. |
We consider generalized partial linear models with independent observations from an exponential family of the form , where the specifications of the functions
and
determine the sub-family of models. The mean vector
has values
, and is related to a design matrix of predictor variables through a monotonically increasing link function
, where
is the systematic component and describes the relationship with the predictors. The relationship between
and
is determined by the link function
.
For the additive model, the systematic component is specified for each observation by , where the functions
describe the relationships of the non-parametrically modelled predictors
,
is a parameter vector, and
contains the values of variables to be modelled parametrically. The non-parametric components are modelled with shape or order assumptions with optional smoothing, and the solution is obtained through an iteratively re-weighted cone projection, with no back-fitting of individual components.
Suppose that is a
by
vector. The matrix form of the systematic component and the predictor is
, where
is the individual component for the
th non-parametrically modelled predictor,
, and
is an
by
design matrix for the parametrically modelled covariate.
To model the component , smooth regression splines or non-smooth ordinal basis functions can be used. The constraints for the component
are in
. In the first case,
=
, where
and
, where
has regression splines as columns and
is the linear space contained in
, and in the second case,
=
and
, for inequality constraint matrix
and equality constraint matrix
.
The set is a convex cone and the set
is also a convex cone with a finite set of edges, where the edges are the generators of
, and
is the column space of the design matrix
for the parametrically modelled covariate.
An iteratively re-weighted cone projection algorithm is used to fit the generalized regression model over the cone .
See references cited in this section and the official manual (https://cran.r-project.org/package=coneproj) for the R package coneproj for more details.
etahat |
The fitted systematic component |
muhat |
The fitted mean value, obtained by transforming the systematic component |
vcoefs |
The estimated coefficients for the basis spanning the null space of the constraint set. |
xcoefs |
The estimated coefficients for the edges corresponding to the smooth predictors with no shape constraint and shape-restricted predictors. |
zcoefs |
The estimated coefficients for the parametrically modelled covariate, i.e., the estimation for the vector |
ucoefs |
The estimated coefficients for the edges corresponding to the predictors with an umbrella-ordering constraint. |
tcoefs |
The estimated coefficients for the edges corresponding to the predictors with a tree-ordering constraint. |
coefs |
The estimated coefficients for the basis spanning the null space of the constraint set and edges corresponding to the shape-restricted and order-restricted predictors. |
cic |
The cone information criterion proposed in Meyer(2013a). It uses the "null expected degrees of freedom" as a measure of the complexity of the model. See Meyer(2013a) for further details of cic. |
d0 |
The dimension of the linear space contained in the cone generated by all constraint conditions. |
edf0 |
The estimated "null expected degrees of freedom". It is a measure of the complexity of the model. See Meyer (2013a) and Meyer (2013b) for further details. |
edf |
The constrained effective degrees of freedom. |
etacomps |
The fitted systematic component value for non-parametrically modelled predictors. It is a matrix of which each row is the fitted systematic component value for a non-parametrically modelled predictor. If there are more than one such predictors, the order of the rows is the same as the order that the user defines such predictors in the formula argument of cgam. |
y |
The response variable. |
xmat_add |
A matrix whose columns represent the shape-restricted predictors and smooth predictors with no shape constraint. |
zmat |
A matrix whose columns represent the basis for the parametrically modelled covariate. The user can choose to include a constant vector in it or not. It must have full column rank. |
ztb |
A list keeping track of the order of the parametrically modelled covariate. |
tr |
A matrix whose columns represent the predictors with a tree-ordering constraint. |
umb |
A matrix whose columns represent the predictors with an umbrella-ordering constraint. |
tree.delta |
A matrix whose rows are the edges corresponding to the predictors with a tree-ordering constraint. |
umbrella.delta |
A matrix whose rows are the edges corresponding to the predictors with an umbrella-ordering constraint. |
bigmat |
A matrix whose rows are the basis spanning the null space of the constraint set and the edges corresponding to the shape-restricted and order-restricted predictors. |
shapes |
A vector including the shape and partial-ordering constraints in a cgam fit. |
shapesx |
A vector including the shape constraints in a cgam fit. |
prior.w |
User-defined weights. |
wt |
The weights in the final iteration of the iteratively re-weighted cone projections. |
wt.iter |
Logical flag indicating if or not iteratively re-weighted cone projections may be used. If the response is gaussian, then wt.iter = FALSE; if the response is binomial or poisson, then wt.iter = TRUE. |
family |
The family parameter defined in a cgam formula. |
SSE0 |
The sum of squared residuals for the linear part. |
SSE1 |
The sum of squared residuals for the full model. |
pvals.beta |
The approximate p-values for the estimation of the vector |
se.beta |
The standard errors for the estimation of the vector |
null_df |
The degree of freedom for the null model of a cgam fit, i.e., the model only containing a constant vector. |
df |
The degree of freedom for the null space of a cgam fit. |
resid_df_obs |
The observed degree of freedom for the residuals of a cgam fit. |
null_deviance |
The deviance for the null model of a cgam fit, i.e., the model only containing a constant vector. |
deviance |
The residual deviance of a cgam fit. |
tms |
The terms objects extracted by the generic function terms from a cgam fit. |
capm |
The number of edges corresponding to the shape-restricted predictors. |
capms |
The number of edges corresponding to the smooth predictors with no shape constraint. |
capk |
The number of non-constant columns of zmat. |
capt |
The number of edges corresponding to the tree-ordering predictors. |
capu |
The number of edges corresponding to the umbrella-ordering predictors. |
xid1 |
A vector keeping track of the beginning position of the set of edges in bigmat for each shape-restricted predictor and smooth predictor with no shape constraint in xmat. |
xid2 |
A vector keeping track of the end position of the set of edges in bigmat for each shape-restricted predictor and smooth predictor with no shape constraint in xmat. |
tid1 |
A vector keeping track of the beginning position of the set of edges in bigmat for each tree-ordering factor in tr. |
tid2 |
A vector keeping track of the end position of the set of edges in bigmat for each tree-ordering factor in tr. |
uid1 |
A vector keeping track of the beginning position of the set of edges in bigmat for each umbrella-ordering factor in umb. |
uid2 |
A vector keeping track of the end position of the set of edges in bigmat for each umbrella-ordering factor in umb. |
zid |
A vector keeping track of the positions of the parametrically modelled covariate. |
vals |
A vector storing the levels of each variable used as a factor. |
zid1 |
A vector keeping track of the beginning position of the levels of each variable used as a factor. |
zid2 |
A vector keeping track of the end position of the levels of each variable used as a factor. |
nsim |
The number of simulations used to get the cic parameter. |
xnms |
A vector storing the names of the shape-restricted predictors and the smooth predictors with no shape constraint in xmat. |
ynm |
The name of the response variable. |
znms |
A vector storing the names of the parametrically modelled covariate. |
is_param |
A logical scalar showing if or not a variable is a parametrically modelled covariate, which could be a linear term or a factor. |
is_fac |
A logical scalar showing if or not a variable is a factor. |
knots |
A list storing the knots used for each shape-restricted predictor and smooth predictor with no shape constraint. For a smooth, constrained and a smooth, unconstrainted predictor, knots is a vector of more than |
numknots |
A vector storing the number of knots for each shape-restricted predictor and smooth predictor with no shape constraint. For a smooth, constrained and a smooth, unconstrainted predictor, numknots > |
sps |
A character vector storing the space parameter to create knots for each shape-restricted predictor. |
ms |
The centering terms used to make edges for shape-restricted predictors. |
cpar |
The cpar argument in the cgam formula |
vh |
The estimated monotonic variance function. |
kts.var |
The knots used in monotonic variance function estimation. |
call |
The matched call. |
Mary C. Meyer and Xiyue Liao
Liao, X. and Meyer, M. C. (2019) cgam: An R Package for the Constrained Generalized Additive Model. Journal of Statistical Software 89(5), 1–24.
Meyer, M. C. (2018) A Framework for Estimation and Inference in Generalized Additive Models with Shape and Order Restrictions. Statistical Science 33(4), 595–614.
Meyer, M. C. (2013a) Semi-parametric additive constrained regression. Journal of Nonparametric Statistics 25(3), 715.
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
Meyer, M. C. and M. Woodroofe (2000) On the degrees of freedom in shape-restricted regression. Annals of Statistics 28, 1083–1104.
Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.
Mammen, E. and K. Yu (2007) Additive isotonic regression. IMS Lecture Notes-Monograph Series Asymptotics: Particles, Process, and Inverse Problems 55, 179–195.
Huang, J. (2002) A note on estimating a partly linear model under monotonicity constraints. Journal of Statistical Planning and Inference 107, 343–351.
Cheng, G.(2009) Semiparametric additive isotonic regression. Journal of Statistical Planning and Inference 139, 1980–1991.
Bacchetti, P. (1989) Additive isotonic models. Journal of the American Statistical Association 84(405), 289–294.
# Example 1. data(cubic) # extract x x <- cubic$x # extract y y <- cubic$y # regress y on x with no restriction with lm() fit.lm <- lm(y ~ x + I(x^2) + I(x^3)) # regress y on x under the restriction: "increasing and convex" fit.cgam <- cgam(y ~ incr.conv(x)) # make a plot to compare the two fits par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, fit.cgam$muhat, col = 2, lty = 2) lines(x, fitted(fit.lm), col = 1, lty = 1) legend("topleft", bty = "n", c("constrained cgam fit", "unconstrained lm fit"), lty = c(2, 1), col = c(2, 1)) # Example 2. ## Not run: library(gam) data(kyphosis) # regress Kyphosis on Age, Number, and Start under the restrictions: # "concave", "increasing and concave", and "decreasing and concave" fit <- cgam(Kyphosis ~ conc(Age) + incr.conc(Number) + decr.conc(Start), family = binomial(), data = kyphosis) ## End(Not run) # Example 3. library(MASS) data(Rubber) # regress loss on hard and tens under the restrictions: # "decreasing" and "decreasing" fit.cgam <- cgam(loss ~ decr(hard) + decr(tens), data = Rubber) # "smooth and decreasing" and "smooth and decreasing" fit.cgam.s <- cgam(loss ~ s.decr(hard) + s.decr(tens), data = Rubber) summary(fit.cgam.s) anova(fit.cgam.s) # make a 3D plot based on fit.cgam and fit.cgam.s plotpersp(fit.cgam, th = 120, main = "3D Plot of a Cgam Fit") plotpersp(fit.cgam.s, tens, hard, data = Rubber, th = 120, main = "3D Plot of a Smooth Cgam Fit") # Example 4. monotonic variance estimation n <- 400 x <- runif(n) sig <- .1 + exp(15*x-8)/(1+exp(15*x-8)) e <- rnorm(n) mu <- 10*x^2 y <- mu + sig*e fit <- cgam(y ~ s.incr.conv(x), var.est = s.incr(x)) est.var <- fit$vh muhat <- fit$muhat par(mfrow = c(1, 2)) plot(x, y) points(sort(x), muhat[order(x)], type = "l", lwd = 2, col = 2) lines(sort(x), (mu)[order(x)], col = 4) plot(sort(x), est.var[order(x)], col=2, lwd=2, type="l", lty=2, ylab="Variance", ylim=c(0, max(c(est.var, sig^2)))) points(sort(x), (sig^2)[order(x)], col=1, lwd=2, type="l") # Example 5. monotonic variance estimation with the lidar data set in SemiPar library(SemiPar) data(lidar) fit <- cgam(logratio ~ s.decr(range), var.est=s.incr(range), data=lidar) muhat <- fit$muhat est.var <- fit$vh logratio <- lidar$logratio range <- lidar$range pfit <- predict(fit, newData=data.frame(range=range), interval="confidence", level=0.95) upp <- pfit$upper low <- pfit$lower par(mfrow = c(1, 2)) plot(range, logratio) points(sort(range), muhat[order(range)], type = "l", lwd = 2, col = 2) lines(sort(range), upp[order(range)], type = "l", lwd = 2, col = 4) lines(sort(range), low[order(range)], type = "l", lwd = 2, col = 4) title("Smoothly Decreasing Fit with a Point-Wise Confidence Interval", cex.main=0.5) plot(range, est.var, col=2, lwd=2, type="l",lty=2, ylab="variance") title("Smoothly Increasing Variance", cex.main=0.5)
# Example 1. data(cubic) # extract x x <- cubic$x # extract y y <- cubic$y # regress y on x with no restriction with lm() fit.lm <- lm(y ~ x + I(x^2) + I(x^3)) # regress y on x under the restriction: "increasing and convex" fit.cgam <- cgam(y ~ incr.conv(x)) # make a plot to compare the two fits par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, fit.cgam$muhat, col = 2, lty = 2) lines(x, fitted(fit.lm), col = 1, lty = 1) legend("topleft", bty = "n", c("constrained cgam fit", "unconstrained lm fit"), lty = c(2, 1), col = c(2, 1)) # Example 2. ## Not run: library(gam) data(kyphosis) # regress Kyphosis on Age, Number, and Start under the restrictions: # "concave", "increasing and concave", and "decreasing and concave" fit <- cgam(Kyphosis ~ conc(Age) + incr.conc(Number) + decr.conc(Start), family = binomial(), data = kyphosis) ## End(Not run) # Example 3. library(MASS) data(Rubber) # regress loss on hard and tens under the restrictions: # "decreasing" and "decreasing" fit.cgam <- cgam(loss ~ decr(hard) + decr(tens), data = Rubber) # "smooth and decreasing" and "smooth and decreasing" fit.cgam.s <- cgam(loss ~ s.decr(hard) + s.decr(tens), data = Rubber) summary(fit.cgam.s) anova(fit.cgam.s) # make a 3D plot based on fit.cgam and fit.cgam.s plotpersp(fit.cgam, th = 120, main = "3D Plot of a Cgam Fit") plotpersp(fit.cgam.s, tens, hard, data = Rubber, th = 120, main = "3D Plot of a Smooth Cgam Fit") # Example 4. monotonic variance estimation n <- 400 x <- runif(n) sig <- .1 + exp(15*x-8)/(1+exp(15*x-8)) e <- rnorm(n) mu <- 10*x^2 y <- mu + sig*e fit <- cgam(y ~ s.incr.conv(x), var.est = s.incr(x)) est.var <- fit$vh muhat <- fit$muhat par(mfrow = c(1, 2)) plot(x, y) points(sort(x), muhat[order(x)], type = "l", lwd = 2, col = 2) lines(sort(x), (mu)[order(x)], col = 4) plot(sort(x), est.var[order(x)], col=2, lwd=2, type="l", lty=2, ylab="Variance", ylim=c(0, max(c(est.var, sig^2)))) points(sort(x), (sig^2)[order(x)], col=1, lwd=2, type="l") # Example 5. monotonic variance estimation with the lidar data set in SemiPar library(SemiPar) data(lidar) fit <- cgam(logratio ~ s.decr(range), var.est=s.incr(range), data=lidar) muhat <- fit$muhat est.var <- fit$vh logratio <- lidar$logratio range <- lidar$range pfit <- predict(fit, newData=data.frame(range=range), interval="confidence", level=0.95) upp <- pfit$upper low <- pfit$lower par(mfrow = c(1, 2)) plot(range, logratio) points(sort(range), muhat[order(range)], type = "l", lwd = 2, col = 2) lines(sort(range), upp[order(range)], type = "l", lwd = 2, col = 4) lines(sort(range), low[order(range)], type = "l", lwd = 2, col = 4) title("Smoothly Decreasing Fit with a Point-Wise Confidence Interval", cex.main=0.5) plot(range, est.var, col=2, lwd=2, type="l",lty=2, ylab="variance") title("Smoothly Increasing Variance", cex.main=0.5)
This routine is an addition to the main routine cgam in this package. A random-intercept component is included in a cgam model.
cgamm(formula, nsim = 0, family = gaussian(), cpar = 1.2, data = NULL, weights = NULL, sc_x = FALSE, sc_y = FALSE, bisect = TRUE, reml = TRUE, nAGQ = 1L)
cgamm(formula, nsim = 0, family = gaussian(), cpar = 1.2, data = NULL, weights = NULL, sc_x = FALSE, sc_y = FALSE, bisect = TRUE, reml = TRUE, nAGQ = 1L)
formula |
A formula object which gives a symbolic description of the model to be fitted. It has the form "response ~ predictor + (1|id)", where id is the label for a group effect. For now, only gaussian responses are considered and this routine only includes a random-intercept effect. See |
nsim |
The number of simulations used to get the cic parameter. The default is nsim = 0. |
family |
A parameter indicating the error distribution and link function to be used in the model. For now, the options are family = gaussian() and family = binomial(). |
cpar |
A multiplier to estimate the model variance, which is defined as |
data |
An optional data frame, list or environment containing the variables in the model. The default is data = NULL. |
weights |
An optional non-negative vector of "replicate weights" which has the same length as the response vector. If weights are not given, all weights are taken to equal 1. The default is weights = NULL. |
sc_x |
Logical flag indicating if or not continuous predictors are normalized. The default is sc_x = FALSE. |
sc_y |
Logical flag indicating if or not the response variable is normalized. The default is sc_y = FALSE. |
bisect |
If bisect = TRUE, a 95 percent confidence interval will be found for the variance ratio parameter by a bisection method. |
reml |
If reml = TRUE, restricted maximum likelihood (REML) method will be used to find estimates instead of maximum likelihood estimation (MLE). |
nAGQ |
Integer scalar - the number of points per axis for evaluating the adaptive Gauss-Hermite approximation to the log-likelihood. Defaults to 1, corresponding to the Laplace approximation. Values greater than 1 produce greater accuracy in the evaluation of the log-likelihood at the expense of speed. |
muhat |
The fitted fixed-effect term. |
ahat |
A vector of estimated random-effect terms. |
sig2hat |
Estimate of the variance ( |
siga2hat |
Estimate of the variance ( |
thhat |
Estimate of the ratio ( |
pv.siga2 |
|
ci.siga2 |
95 percent confidence interval for the variance of within-cluster error terms. |
ci.th |
95 percent confidence interval for ratio of two variances. |
ci.rho |
95 percent confidence interval for intraclass correlation coefficient. |
ci.sig2 |
95 percent confidence interval for the variance of between-cluster error terms. |
call |
The matched call. |
Xiyue Liao
# Example 1 (family = gaussian). # simulate a balanced data set with 30 clusters # each cluster has 30 data points n <- 30 m <- 30 # the standard deviation of between cluster error terms is 1 # the standard deviation of within cluster error terms is 2 sige <- 1 siga <- 2 # generate a continuous predictor x <- 1:(m*n) for(i in 1:m) { x[(n*(i-1)+1):(n*i)] <- round(runif(n), 3) } # generate a group factor group <- trunc(0:((m*n)-1)/n)+1 # generate the fixed-effect term mu <- 10*exp(10*x-5)/(1+exp(10*x-5)) # generate the random-intercept term asscosiated with each group avals <- rnorm(m, 0, siga) # generate the response y <- 1:(m*n) for(i in 1:m){ y[group == i] <- mu[group == i] + avals[i] + rnorm(n, 0, sige) } # use REML method to fit the model ans <- cgamm(y ~ s.incr(x) + (1|group), reml=TRUE) summary(ans) anova(ans) muhat <- ans$muhat plot(x, y, col = group, cex = .6) lines(sort(x), mu[order(x)], lwd = 2) lines(sort(x), muhat[order(x)], col = 2, lty = 2, lwd = 2) legend("topleft", bty = "n", c("true fixed-effect term", "cgamm fit"), col = c(1, 2), lty = c(1, 2), lwd = c(2, 2)) # Example 2 (family = binomial). # simulate a balanced data set with 20 clusters # each cluster has 20 data points n <- 20 m <- 20# N <- n*m # siga is the sd for the random intercept siga <- 1 # generate a group factor group <- trunc(0:((m*n)-1)/n)+1 group <- factor(group) # generate the random-intercept term asscosiated with each group avals <- rnorm(m,0,siga) # generate the fixed-effect mean term: mu, systematic term: eta and the response: y x <- runif(m*n) mu <- 1:(m*n) y <- 1:(m*n) eta <- 2 * (1 + tanh(7 * (x - .8))) - 2 eta0 <- eta for(i in 1:m){eta[group == i] <- eta[group == i] + avals[i]} for(i in 1:m){mu[group == i] <- 1 - 1 / (1 + exp(eta[group == i]))} for(i in 1:m){y[group == i] <- rbinom(n, size = 1, prob = mu[group == i])} dat <- data.frame(x = x, y = y, group = group) ansc <- cgamm(y ~ s.incr.conv(x) + (1|group), family = binomial(link = "logit"), reml = FALSE, data = dat) summary(ansc) anova(ansc)
# Example 1 (family = gaussian). # simulate a balanced data set with 30 clusters # each cluster has 30 data points n <- 30 m <- 30 # the standard deviation of between cluster error terms is 1 # the standard deviation of within cluster error terms is 2 sige <- 1 siga <- 2 # generate a continuous predictor x <- 1:(m*n) for(i in 1:m) { x[(n*(i-1)+1):(n*i)] <- round(runif(n), 3) } # generate a group factor group <- trunc(0:((m*n)-1)/n)+1 # generate the fixed-effect term mu <- 10*exp(10*x-5)/(1+exp(10*x-5)) # generate the random-intercept term asscosiated with each group avals <- rnorm(m, 0, siga) # generate the response y <- 1:(m*n) for(i in 1:m){ y[group == i] <- mu[group == i] + avals[i] + rnorm(n, 0, sige) } # use REML method to fit the model ans <- cgamm(y ~ s.incr(x) + (1|group), reml=TRUE) summary(ans) anova(ans) muhat <- ans$muhat plot(x, y, col = group, cex = .6) lines(sort(x), mu[order(x)], lwd = 2) lines(sort(x), muhat[order(x)], col = 2, lty = 2, lwd = 2) legend("topleft", bty = "n", c("true fixed-effect term", "cgamm fit"), col = c(1, 2), lty = c(1, 2), lwd = c(2, 2)) # Example 2 (family = binomial). # simulate a balanced data set with 20 clusters # each cluster has 20 data points n <- 20 m <- 20# N <- n*m # siga is the sd for the random intercept siga <- 1 # generate a group factor group <- trunc(0:((m*n)-1)/n)+1 group <- factor(group) # generate the random-intercept term asscosiated with each group avals <- rnorm(m,0,siga) # generate the fixed-effect mean term: mu, systematic term: eta and the response: y x <- runif(m*n) mu <- 1:(m*n) y <- 1:(m*n) eta <- 2 * (1 + tanh(7 * (x - .8))) - 2 eta0 <- eta for(i in 1:m){eta[group == i] <- eta[group == i] + avals[i]} for(i in 1:m){mu[group == i] <- 1 - 1 / (1 + exp(eta[group == i]))} for(i in 1:m){y[group == i] <- rbinom(n, size = 1, prob = mu[group == i])} dat <- data.frame(x = x, y = y, group = group) ansc <- cgamm(y ~ s.incr.conv(x) + (1|group), family = binomial(link = "logit"), reml = FALSE, data = dat) summary(ansc) anova(ansc)
This data set contains 9167 records of different species of live trees for 345 sampled forested plots measured in 2015.
data("COforest")
data("COforest")
A data frame with 9167 observations on the following 19 variables.
PLT_CN
Unique identifier of plot
STATECD
State code using Bureau of Census Federal Information Processing Standards (FIPS)
COUNTYCD
County code (FIPS)
ELEV_PUBLIC
Elevation (ft) extracted spatially using LON_PUBLIC/LAT_PUBLIC
LON_PUBLIC
Fuzzed longitude in decimal degrees using NAD83 datum
LAT_PUBLIC
Fuzzed latitude in decimal degrees using NAD83 datum
ASPECT
a numeric vector
SLOPE
a numeric vector
SUBP
Subplot number
TREE
Tree number within subplot
STATUSCD
Tree status (0:no status; 1:live tree; 2:dead tree; 3:removed)
SPCD
Species code
DIA
Current diameter (in)
HT
Total height (ft): estimated when broken or missing
ACTUALHT
Actual height (ft): measured standing or down
HTCD
Height method code (1:measured; 2:actual measured-length estimated; 3:actual and length estimated; 4:modeled
TREECLCD
Tree class code (2:growing-stock; 3:rough cull; 4:rotten cull)
CR
Compacted crown ratio (percentage)
CCLCD
Crown class (1:open grown; 2:domimant; 3:co-dominant; 4:intermediate; 5:overtopped)
It is provided by Forest Inventory Analysis (FIA) National Program.
X. Liao and M. Meyer (2019). Estimation and Inference in Mixed-Effect Regression Models using Shape Constraints, with Application to Tree Height Estimation. (to appear in Journal of the Royal Statistical Society. Series C: Applied Statistics)
## Not run: library(dplyr) library(tidyr) data(COforest) #re-grouping classes of CCLCD: #combine dominant (2) and co-dominant (3) #combine intermediate (4) and overtopped (5) COforest = COforest mutate(CCLCD = replace(CCLCD, CCLCD == 5, 4)) #make a list of species, each element is a small data frame for one species species = COforest #get the subset for quaking aspen, which is the 4th element in the species list sub = species$data[[4]] #for quaking aspen, there are only two crown classes: dominant/co-dominant #and intermediate/overtopped table(sub$CCLCD) # 3 4 #1400 217 #for quaking aspen, there are only two tree clases: growing-stock and rough cull table(sub$TREECLCD) #2 3 #1591 26 #fit the model ansc = cgamm(log(HT)~s.incr.conc(DIA)+factor(CCLCD)+factor(TREECLCD) +(1|PLT_CN), reml=TRUE, data=sub) #check which classes are significant summary(ansc) #fixed-effect 95 newData = data.frame(DIA=sub$DIA,CCLCD=sub$CCLCD,TREECLCD=sub$TREECLCD) pfit = predict(ansc, newData,interval='confidence') lower = pfit$lower upper = pfit$upper #we need to use exp(muhat) later in the plot muhat = pfit$fit #get x and y x = sub$DIA y = sub$HT #get TREECLCD and CCLCD z1 = sub$TREECLCD z2 = sub$CCLCD #plot fixed-effect confidence intervals plot(x, y, xlab='DIA (m)', ylab='HT (m)', ylim=c(min(y),max(exp(upper))+10),type='n') lines(sort(x[z2==3&z1==2]), (exp(pfit$fit)[z2==3&z1==2])[order(x[z2==3&z1==2])], col='slategrey', lty=1, lwd=2) lines(sort(x[z2==3&z1==2]), (exp(pfit$lower)[z2==3&z1==2])[order(x[z2==3&z1==2])], col='slategrey', lty=1, lwd=2) lines(sort(x[z2==3&z1==2]), (exp(pfit$upper)[z2==3&z1==2])[order(x[z2==3&z1==2])], col='slategrey', lty=1, lwd=2) lines(sort(x[z2==4&z1==2]), (exp(pfit$fit)[z2==4&z1==2])[order(x[z2==4&z1==2])], col="blueviolet", lty=2, lwd=2) lines(sort(x[z2==4&z1==2]), (exp(pfit$lower)[z2==4&z1==2])[order(x[z2==4&z1==2])], col="blueviolet", lty=2, lwd=2) lines(sort(x[Cz2==4&z1==2]), (exp(pfit$upper)[Cz2==4&z1==2])[order(x[Cz2==4&z1==2])], col="blueviolet", lty=2, lwd=2) lines(sort(x[Cz2==3&z1==3]), (exp(pfit$fit)[Cz2==3&z1==3])[order(x[Cz2==3&z1==3])], col=3, lty=3, lwd=2) lines(sort(x[Cz2==3&z1==3]), (exp(pfit$lower)[Cz2==3&z1==3])[order(x[Cz2==3&z1==3])], col=3, lty=3, lwd=2) lines(sort(x[Cz2==3&z1==3]), (exp(pfit$upper)[Cz2==3&z1==3])[order(x[Cz2==3&z1==3])], col=3, lty=3, lwd=2) lines(sort(x[Cz2==4&z1==3]), (exp(pfit$fit)[Cz2==4&z1==3])[order(x[Cz2==4&z1==3])], col=2, lty=4, lwd=2) lines(sort(x[Cz2==4&z1==3]), (exp(pfit$lower)[Cz2==4&z1==3])[order(x[Cz2==4&z1==3])], col=2, lty=4, lwd=2) lines(sort(x[Cz2==4&z1==3]), (exp(pfit$upper)[Cz2==4&z1==3])[order(x[Cz2==4&z1==3])], col=2, lty=4, lwd=2) legend('bottomright', bty='n', cex=.9,c('dominant/co-dominant and growing stock', 'intermediate/overtopped and growing stock','dominant/co-dominant and rough cull', 'intermediate/overtopped and rough cull'), col=c('slategrey',"blueviolet",3,2), lty=c(1,2,3,4),lwd=c(2,2,2,2), pch=c(24,23,22,21)) title('Quaking Aspen fits with 95 ## End(Not run)
## Not run: library(dplyr) library(tidyr) data(COforest) #re-grouping classes of CCLCD: #combine dominant (2) and co-dominant (3) #combine intermediate (4) and overtopped (5) COforest = COforest mutate(CCLCD = replace(CCLCD, CCLCD == 5, 4)) #make a list of species, each element is a small data frame for one species species = COforest #get the subset for quaking aspen, which is the 4th element in the species list sub = species$data[[4]] #for quaking aspen, there are only two crown classes: dominant/co-dominant #and intermediate/overtopped table(sub$CCLCD) # 3 4 #1400 217 #for quaking aspen, there are only two tree clases: growing-stock and rough cull table(sub$TREECLCD) #2 3 #1591 26 #fit the model ansc = cgamm(log(HT)~s.incr.conc(DIA)+factor(CCLCD)+factor(TREECLCD) +(1|PLT_CN), reml=TRUE, data=sub) #check which classes are significant summary(ansc) #fixed-effect 95 newData = data.frame(DIA=sub$DIA,CCLCD=sub$CCLCD,TREECLCD=sub$TREECLCD) pfit = predict(ansc, newData,interval='confidence') lower = pfit$lower upper = pfit$upper #we need to use exp(muhat) later in the plot muhat = pfit$fit #get x and y x = sub$DIA y = sub$HT #get TREECLCD and CCLCD z1 = sub$TREECLCD z2 = sub$CCLCD #plot fixed-effect confidence intervals plot(x, y, xlab='DIA (m)', ylab='HT (m)', ylim=c(min(y),max(exp(upper))+10),type='n') lines(sort(x[z2==3&z1==2]), (exp(pfit$fit)[z2==3&z1==2])[order(x[z2==3&z1==2])], col='slategrey', lty=1, lwd=2) lines(sort(x[z2==3&z1==2]), (exp(pfit$lower)[z2==3&z1==2])[order(x[z2==3&z1==2])], col='slategrey', lty=1, lwd=2) lines(sort(x[z2==3&z1==2]), (exp(pfit$upper)[z2==3&z1==2])[order(x[z2==3&z1==2])], col='slategrey', lty=1, lwd=2) lines(sort(x[z2==4&z1==2]), (exp(pfit$fit)[z2==4&z1==2])[order(x[z2==4&z1==2])], col="blueviolet", lty=2, lwd=2) lines(sort(x[z2==4&z1==2]), (exp(pfit$lower)[z2==4&z1==2])[order(x[z2==4&z1==2])], col="blueviolet", lty=2, lwd=2) lines(sort(x[Cz2==4&z1==2]), (exp(pfit$upper)[Cz2==4&z1==2])[order(x[Cz2==4&z1==2])], col="blueviolet", lty=2, lwd=2) lines(sort(x[Cz2==3&z1==3]), (exp(pfit$fit)[Cz2==3&z1==3])[order(x[Cz2==3&z1==3])], col=3, lty=3, lwd=2) lines(sort(x[Cz2==3&z1==3]), (exp(pfit$lower)[Cz2==3&z1==3])[order(x[Cz2==3&z1==3])], col=3, lty=3, lwd=2) lines(sort(x[Cz2==3&z1==3]), (exp(pfit$upper)[Cz2==3&z1==3])[order(x[Cz2==3&z1==3])], col=3, lty=3, lwd=2) lines(sort(x[Cz2==4&z1==3]), (exp(pfit$fit)[Cz2==4&z1==3])[order(x[Cz2==4&z1==3])], col=2, lty=4, lwd=2) lines(sort(x[Cz2==4&z1==3]), (exp(pfit$lower)[Cz2==4&z1==3])[order(x[Cz2==4&z1==3])], col=2, lty=4, lwd=2) lines(sort(x[Cz2==4&z1==3]), (exp(pfit$upper)[Cz2==4&z1==3])[order(x[Cz2==4&z1==3])], col=2, lty=4, lwd=2) legend('bottomright', bty='n', cex=.9,c('dominant/co-dominant and growing stock', 'intermediate/overtopped and growing stock','dominant/co-dominant and rough cull', 'intermediate/overtopped and rough cull'), col=c('slategrey',"blueviolet",3,2), lty=c(1,2,3,4),lwd=c(2,2,2,2), pch=c(24,23,22,21)) title('Quaking Aspen fits with 95 ## End(Not run)
A symbolic routine to define that the systematic component is concave in a predictor in a formula argument to cgam. This is the unsmoothed version.
conc(x, numknots = 0, knots = 0, space = "E")
conc(x, numknots = 0, knots = 0, space = "E")
x |
A numeric predictor which has the same length as the response vector. |
numknots |
The number of knots used to smoothly constrain a predictor. The value should be |
knots |
The knots used to smoothly constrain a predictor. The value should be |
space |
A character specifying the method to create knots. It will not be used for a shape-restricted predictor without smoothing. The default value is "E". |
"conc" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "conc"; the shape attribute is 4("concave"), and according to the value of the vector itself and its shape attribute, the cone edges of the cone generated by the constraint matrix, which constrains the relationship between the systematic component and "x" to be concave, will be made. The cone edges are a set of basis employed in the hinge algorithm.
Note that "conc" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.
See references cited in this section for more details.
The vector x with five attributes, i.e., name: the name of x; shape: 4("concave"); numknots: the numknots argument in "conc"; knots: the knots argument in "conc"; space: the space argument in "conc".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
# generate y x <- seq(-1, 2, by = 0.1) n <- length(x) y <- - x^2 + rnorm(n, .3) # regress y on x under the shape-restriction: "concave" ans <- cgam(y ~ conc(x)) # make a plot plot(x, y) lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "concave fit", col = 2, lty = 1)
# generate y x <- seq(-1, 2, by = 0.1) n <- length(x) y <- - x^2 + rnorm(n, .3) # regress y on x under the shape-restriction: "concave" ans <- cgam(y ~ conc(x)) # make a plot plot(x, y) lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "concave fit", col = 2, lty = 1)
A symbolic routine to define that the systematic component is convex in a predictor in a formula argument to cgam. This is the unsmoothed version.
conv(x, numknots = 0, knots = 0, space = "E")
conv(x, numknots = 0, knots = 0, space = "E")
x |
A numeric predictor which has the same length as the response vector. |
numknots |
The number of knots used to smoothly constrain a predictor. The value should be |
knots |
The knots used to smoothly constrain a predictor. The value should be |
space |
A character specifying the method to create knots. It will not be used for a shape-restricted predictor without smoothing. The default value is "E". |
"conv" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "conv"; the shape attribute is 3("convex"), and according to the value of the vector itself and its shape attribute, the cone edges of the cone generated by the constraint matrix, which constrains the relationship between the systematic component and "x" to be convex, will be made. The cone edges are a set of basis employed in the hinge algorithm.
Note that "conv" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.
See references cited in this section for more details.
The vector x with five attributes, i.e., name: the name of x; shape: 3("convex"); numknots: the numknots argument in "conv"; knots: the knots argument in "conv" ; space: the space argument in "conv".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
# generate y x <- seq(-1, 2, by = 0.1) n <- length(x) y <- x^2 + rnorm(n, .3) # regress y on x under the shape-restriction: "convex" ans <- cgam(y ~ conv(x)) # make a plot plot(x, y) lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "convex fit", col = 2, lty = 1)
# generate y x <- seq(-1, 2, by = 0.1) n <- length(x) y <- x^2 + rnorm(n, .3) # regress y on x under the shape-restriction: "convex" ans <- cgam(y ~ conv(x)) # make a plot plot(x, y) lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "convex fit", col = 2, lty = 1)
This data set is used for several examples in the cgam package.
data(cubic)
data(cubic)
A data frame with 50 observations on the following 2 variables.
x
The predictor vector.
y
The response vector.
STAT640 HW 14 given by Dr. Meyer.
A symbolic routine to define that the systematic component is decreasing in a predictor in a formula argument to cgam. This is the unsmoothed version.
decr(x, numknots = 0, knots = 0, space = "E")
decr(x, numknots = 0, knots = 0, space = "E")
x |
A numeric predictor which has the same length as the response vector. |
numknots |
The number of knots used to smoothly constrain a predictor. The value should be |
knots |
The knots used to smoothly constrain a predictor. The value should be |
space |
A character specifying the method to create knots. It will not be used for a shape-restricted predictor without smoothing. The default value is "E". |
"decr" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "decr"; the shape attribute is 2("decreasing"), and according to the value of the vector itself and its shape attribute, the cone edges of the cone generated by the constraint matrix, which constrains the relationship between the systematic component and "x" to be decreasing, will be made. The cone edges are a set of basis employed in the hinge algorithm.
Note that "decr" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.
See references cited in this section for more details.
The vector x with five attributes, i.e., name: the name of x; shape: 2("decreasing"); numknots: the numknots argument in "decr"; knots: the knots argument in "decr"; space: the space argument in "decr".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
data(cubic) # extract x x <- - cubic$x # extract y y <- cubic$y # regress y on x with the shape restriction: "decreasing" ans <- cgam(y ~ decr(x)) # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("bottomright", bty = "n", "decreasing fit", col = 2, lty = 1)
data(cubic) # extract x x <- - cubic$x # extract y y <- cubic$y # regress y on x with the shape restriction: "decreasing" ans <- cgam(y ~ decr(x)) # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("bottomright", bty = "n", "decreasing fit", col = 2, lty = 1)
A symbolic routine to define that the systematic component is decreasing and concave in a predictor in a formula argument to cgam. This is the unsmoothed version.
decr.conc(x, numknots = 0, knots = 0, space = "E")
decr.conc(x, numknots = 0, knots = 0, space = "E")
x |
A numeric predictor which has the same length as the response vector. |
numknots |
The number of knots used to smoothly constrain a predictor. The value should be |
knots |
The knots used to smoothly constrain a predictor. The value should be |
space |
A character specifying the method to create knots. It will not be used for a shape-restricted predictor without smoothing. The default value is "E". |
"decr.conc" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "decr.conc"; the shape attribute is 8("decreasing and concave"), and according to the value of the vector itself and its shape attribute, the cone edges of the cone generated by the constraint matrix, which constrains the relationship between the systematic component and "x" to be decreasing and concave, will be made. The cone edges are a set of basis employed in the hinge algorithm.
Note that "decr.conc" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.
See references cited in this section for more details.
The vector x with five attributes, i.e., name: the name of x; shape: 8("decreasing and concave"); numknots: the numknots argument in "decr.conc"; knots: the knots argument in "decr.conc"; space: the space argument in "decr.conc".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
data(cubic) # extract x x <- cubic$x # extract y y <- - cubic$y # regress y on x with the shape restriction: "decreasing" and "concave" ans <- cgam(y ~ decr.conc(x)) # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "decreasing and concave fit", col = 2, lty = 1)
data(cubic) # extract x x <- cubic$x # extract y y <- - cubic$y # regress y on x with the shape restriction: "decreasing" and "concave" ans <- cgam(y ~ decr.conc(x)) # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "decreasing and concave fit", col = 2, lty = 1)
A symbolic routine to define that the systematic component is decreasing and convex in a predictor in a formula argument to cgam. This is the unsmoothed version.
decr.conv(x, numknots = 0, knots = 0, space = "E")
decr.conv(x, numknots = 0, knots = 0, space = "E")
x |
A numeric predictor which has the same length as the response vector. |
numknots |
The number of knots used to smoothly constrain a predictor. The value should be |
knots |
The knots used to smoothly constrain a predictor. The value should be |
space |
A character specifying the method to create knots. It will not be used for a shape-restricted predictor without smoothing. The default value is "E". |
"decr.conv" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "decr.conv"; the shape attribute is 6("decreasing and convex"), and according to the value of the vector itself and its shape attribute, the cone edges of the cone generated by the constraint matrix, which constrains the relationship between the systematic component and "x" to be decreasing and convex, will be made. The cone edges are a set of basis employed in the hinge algorithm.
Note that "decr.conv" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.
See references cited in this section for more details.
The vector x with five attributes, i.e., name: the name of x; shape: 6("decreasing and convex"); numknots: the numknots argument in "decr.conv"; knots: the knots argument in "decr.conv"; space: the space argument in "decr.conv".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
data(cubic) # extract x x <- - cubic$x # extract y y <- cubic$y # regress y on x with the shape restriction: "decreasing" and "convex" ans <- cgam(y ~ decr.conv(x)) # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("bottomright", bty = "n", "decreasing and convex fit", col = 2, lty = 1)
data(cubic) # extract x x <- - cubic$x # extract y y <- cubic$y # regress y on x with the shape restriction: "decreasing" and "convex" ans <- cgam(y ~ decr.conv(x)) # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("bottomright", bty = "n", "decreasing and convex fit", col = 2, lty = 1)
A symbolic routine to indicate that a predictor is included as a non-parametrically modeled predictor in a formula argument to ShapeSelect.
in.or.out(z)
in.or.out(z)
z |
A non-parametrically modelled predictor which has the same length as the response vector. |
To include a categorical predictor, in.or.out(factor(z)) is used, and to include a linear predictor z, in.or.out(z) is used. If in.or.out is not used, the user can include z in a model by adding z or factor(z) in a ShapeSelect formula.
The vector z with three attributes, i.e., nm: the name of z; shape: 1 or 0 (in or out of the model); type: "fac" or "lin", i.e., z is modelled as a categorical predictor or a linear predictor.
Xiyue Liao
## Not run: n <- 100 # x is a continuous predictor x <- runif(n) # generate z and to include it as a categorical predictor z <- rep(0:1, 50) # y is generated as correlated to both x and z # the relationship between y and x is smoothly increasing-convex y <- x^2 + 2 * I(z == 1) + rnorm(n, sd = 1) # call ShapeSelect to find the best model by the genetic algorithm # factor(z) may be in or out of the model fit <- ShapeSelect(y ~ shapes(x) + in.or.out(factor(z)), genetic = TRUE) # factor(z) isn't chosen and is included in the model fit <- ShapeSelect(y ~ shapes(x) + factor(z), genetic = TRUE) ## End(Not run)
## Not run: n <- 100 # x is a continuous predictor x <- runif(n) # generate z and to include it as a categorical predictor z <- rep(0:1, 50) # y is generated as correlated to both x and z # the relationship between y and x is smoothly increasing-convex y <- x^2 + 2 * I(z == 1) + rnorm(n, sd = 1) # call ShapeSelect to find the best model by the genetic algorithm # factor(z) may be in or out of the model fit <- ShapeSelect(y ~ shapes(x) + in.or.out(factor(z)), genetic = TRUE) # factor(z) isn't chosen and is included in the model fit <- ShapeSelect(y ~ shapes(x) + factor(z), genetic = TRUE) ## End(Not run)
A symbolic routine to define that the systematic component is increasing in a predictor in a formula argument to cgam. This is the unsmoothed version.
incr(x, numknots = 0, knots = 0, space = "E")
incr(x, numknots = 0, knots = 0, space = "E")
x |
A numeric predictor which has the same length as the response vector. |
numknots |
The number of knots used to smoothly constrain a predictor. The value should be |
knots |
The knots used to smoothly constrain a predictor. The value should be |
space |
A character specifying the method to create knots. It will not be used for a shape-restricted predictor without smoothing. The default value is "E". |
"incr" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "incr"; the shape attribute is 1("increasing"), and according to the value of the vector itself and its attributes, the cone edges of the cone generated by the constraint matrix, which constrains the relationship between the systematic component and "x" to be increasing, will be made. The cone edges are a set of basis employed in the hinge algorithm.
Note that "incr" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.
See references cited in this section for more details.
The vector x with five attributes, i.e., name: the name of x; shape: 1("increasing"); numknots: the numknots argument in "incr"; knots: the knots argument in "incr"; space: the space argument in "incr".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
data(cubic) # extract x x <- cubic$x # extract y y <- cubic$y # regress y on x with the shape restriction: "increasing" ans <- cgam(y ~ incr(x)) # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "increasing fit", col = 2, lty = 1)
data(cubic) # extract x x <- cubic$x # extract y y <- cubic$y # regress y on x with the shape restriction: "increasing" ans <- cgam(y ~ incr(x)) # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "increasing fit", col = 2, lty = 1)
A symbolic routine to define that the systematic component is increasing and concave in a predictor in a formula argument to cgam. This is the unsmoothed version.
incr.conc(x, numknots = 0, knots = 0, space = "E")
incr.conc(x, numknots = 0, knots = 0, space = "E")
x |
A numeric predictor which has the same length as the response vector. |
numknots |
The number of knots used to smoothly constrain a predictor. The value should be |
knots |
The knots used to smoothly constrain a predictor. The value should be |
space |
A character specifying the method to create knots. It will not be used for a shape-restricted predictor without smoothing. The default value is "E". |
"incr.conc" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "incr.conc"; the shape attribute is 7("increasing and concave"), and according to the value of the vector itself and its attributes, the cone edges of the cone generated by the constraint matrix, which constrains the relationship between the systematic component and "x" to be increasing and concave, will be made. The cone edges are a set of basis employed in the hinge algorithm.
Note that "incr.conc" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.
See references cited in this section for more details.
The vector x with five attributes, i.e., name: the name of x; shape: 7("increasing and concave"); numknots: the numknots argument in "incr.conc"; knots: the knots argument in "incr.conc"; space: the space argument in "incr.conc".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
data(cubic) # extract x x <- - cubic$x # extract y y <- - cubic$y # regress y on x with the shape restriction: "increasing" and "concave" ans <- cgam(y ~ incr.conc(x)) # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "increasing and concave fit", col = 2, lty = 1)
data(cubic) # extract x x <- - cubic$x # extract y y <- - cubic$y # regress y on x with the shape restriction: "increasing" and "concave" ans <- cgam(y ~ incr.conc(x)) # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "increasing and concave fit", col = 2, lty = 1)
A symbolic routine to define that the systematic component is increasing and convex in a predictor in a formula argument to cgam. This is the unsmoothed version.
incr.conv(x, numknots = 0, knots = 0, space = "E")
incr.conv(x, numknots = 0, knots = 0, space = "E")
x |
A numeric predictor which has the same length as the response vector. |
numknots |
The number of knots used to smoothly constrain a predictor. The value should be |
knots |
The knots used to smoothly constrain a predictor. The value should be |
space |
A character specifying the method to create knots. It will not be used for a shape-restricted predictor without smoothing. The default value is "E". |
"incr.conv" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "incr.conv"; the shape attribute is 5("increasing and convex"), and according to the value of the vector itself and its attributes, the cone edges of the cone generated by the constraint matrix, which constrains the relationship between the systematic component and "x" to be increasing and convex, will be made. The cone edges are a set of basis employed in the hinge algorithm.
Note that "incr.conv" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.
See references cited in this section for more details.
The vector x with five attributes, i.e., name: the name of x; shape: 5("increasing and convex"); numknots: the numknots argument in "incr.conv"; knots: the knots argument in "incr.conv"; space: the space argument in "incr.conv".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
data(cubic) # extract x x <- cubic$x # extract y y <- cubic$y # regress y on x with the shape restriction: "increasing" and "convex" ans <- cgam(y ~ incr.conv(x)) # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "increasing and convex fit", col = 2, lty = 1)
data(cubic) # extract x x <- cubic$x # extract y y <- cubic$y # regress y on x with the shape restriction: "increasing" and "convex" ans <- cgam(y ~ incr.conv(x)) # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "increasing and convex fit", col = 2, lty = 1)
The date set is from a study of mental health for a random sample of 40 adult residents of Alachua County, Florida. Mental impairment is an ordinal response with 4 categories: well, mild symptom formation, moderate symptom formation, and impaired, which are recorded as 1, 2, 3, and 4. Life event index is a composite measure of the number and severity of important life events that occurred with the past three years, e.g., birth of a child, new job, divorce, or death of a family member. It is an integer from 0 to 9. Another covariate is socio-economic status and it is measured as binary: high = 1, low = 0.
data(mental)
data(mental)
mental
Mental impairment. It is an ordinal response with categories recorded as
,
,
, and
.
ses
Socio-economic status measured as binary: high = , low =
.
life
Life event index. It is an integer from to
.
Agresti, A. (2010) Analysis of Ordinal Categorical Data, 2nd ed. Hoboken, NJ: Wiley.
# proportional odds model example data(mental) # model the relationship between the latent variable and life event index as increasing # socio-economic status is included as a binary covariate fit.incr <- cgam(mental ~ incr(life) + ses, data = mental, family = Ord) # check the estimated probabilities P(mental = k), k = 1, 2, 3, 4 probs.incr <- fitted(fit.incr) head(probs.incr)
# proportional odds model example data(mental) # model the relationship between the latent variable and life event index as increasing # socio-economic status is included as a binary covariate fit.incr <- cgam(mental ~ incr(life) + ses, data = mental, family = Ord) # check the estimated probabilities P(mental = k), k = 1, 2, 3, 4 probs.incr <- fitted(fit.incr) head(probs.incr)
This is a subroutine to specify an ordered catergorical family in a cgam formula. It set things up to a routine called cgam.polr. This is learned from the polr routine in the MASS package, which fits a logistic or probit regression model to an ordered categorical response. Currently only the logistic regression model is allowed.
Ord(link = "identity")
Ord(link = "identity")
link |
The link function. Users don't need specify this term. |
See the polr section in the official manual of the MASS package (https://cran.r-project.org/package=MASS) for details.
muhat |
The estimated expected value of a latent variable. |
zeta |
Estimated cut-points defining the intervals of a latent variable such that the latent variable is between two adjacent cut-points is equivalent to that the ordered categorical response is in a category. |
Xiyue Liao
Agresti, A. (2002) Categorical Data. Second edition. Wiley.
## Not run: # Example 1. # generate the predictor and the latenet variable n <- 500 set.seed(123) x <- runif(n, 0, 1) yst <- 5*x^2 + rlogis(n) # generate observed ordered response, which has levels 1, 2, 3. cts <- quantile(yst, probs = seq(0, 1, length = 4)) yord <- cut(yst, breaks = cts, include.lowest = TRUE, labels = c(1:3), Ord = TRUE) y <- as.numeric(levels(yord))[yord] # regress y on x under the shape-restriction: the latent variable is "increasing-convex" # w.r.t x ans <- cgam(y ~ s.incr.conv(x), family = Ord) # check the estimated cut-points ans$zeta # check the estimated expected value of the latent variable head(ans$muhat) # check the estimated probabilities P(y = k), k = 1, 2, 3 head(fitted(ans)) # check the estimated latent variable plot(x, yst, cex = 1, type = "n", ylab = "latent variable") cols <- topo.colors(3) for (i in 1:3) { points(x[y == i], yst[y == i], col = cols[i], pch = i, cex = 0.7) } for (i in 1:2) { abline(h = (ans$zeta)[i], lty = 4, lwd = 1) } lines(sort(x), (5*x^2)[order(x)], lwd = 2) lines(sort(x), (ans$muhat)[order(x)], col = 2, lty = 2, lwd = 2) legend("topleft", bty = "n", col = c(1, 2), lty = c(1, 2), c("true latent variable", "increasing-convex fit"), lwd = c(1, 1)) ## End(Not run) ## Not run: # Example 2. mental impairment data set # mental impairment is an ordinal response with 4 categories recorded as 1, 2, 3, and 4 # two covariates are life event index and socio-economic status (high = 1, low = 0) data(mental) table(mental$mental) # model the relationship between the latent variable and life event index as increasing # socio-economic status is included as a binary covariate fit.incr <- cgam(mental ~ incr(life) + ses, data = mental, family = Ord) # check the estimated probabilities P(mental = k), k = 1, 2, 3, 4 probs.incr <- fitted(fit.incr) head(probs.incr) ## End(Not run)
## Not run: # Example 1. # generate the predictor and the latenet variable n <- 500 set.seed(123) x <- runif(n, 0, 1) yst <- 5*x^2 + rlogis(n) # generate observed ordered response, which has levels 1, 2, 3. cts <- quantile(yst, probs = seq(0, 1, length = 4)) yord <- cut(yst, breaks = cts, include.lowest = TRUE, labels = c(1:3), Ord = TRUE) y <- as.numeric(levels(yord))[yord] # regress y on x under the shape-restriction: the latent variable is "increasing-convex" # w.r.t x ans <- cgam(y ~ s.incr.conv(x), family = Ord) # check the estimated cut-points ans$zeta # check the estimated expected value of the latent variable head(ans$muhat) # check the estimated probabilities P(y = k), k = 1, 2, 3 head(fitted(ans)) # check the estimated latent variable plot(x, yst, cex = 1, type = "n", ylab = "latent variable") cols <- topo.colors(3) for (i in 1:3) { points(x[y == i], yst[y == i], col = cols[i], pch = i, cex = 0.7) } for (i in 1:2) { abline(h = (ans$zeta)[i], lty = 4, lwd = 1) } lines(sort(x), (5*x^2)[order(x)], lwd = 2) lines(sort(x), (ans$muhat)[order(x)], col = 2, lty = 2, lwd = 2) legend("topleft", bty = "n", col = c(1, 2), lty = c(1, 2), c("true latent variable", "increasing-convex fit"), lwd = c(1, 1)) ## End(Not run) ## Not run: # Example 2. mental impairment data set # mental impairment is an ordinal response with 4 categories recorded as 1, 2, 3, and 4 # two covariates are life event index and socio-economic status (high = 1, low = 0) data(mental) table(mental$mental) # model the relationship between the latent variable and life event index as increasing # socio-economic status is included as a binary covariate fit.incr <- cgam(mental ~ incr(life) + ses, data = mental, family = Ord) # check the estimated probabilities P(mental = k), k = 1, 2, 3, 4 probs.incr <- fitted(fit.incr) head(probs.incr) ## End(Not run)
This data set is used for the routine plotpersp. It contains 314 observations of blood plasma, beta carotene measurements along with several covariates. High levels of blood plasma and beta carotene are believed to be protective against cancer, and it is of interest to determine the relationships with covariates.
data(plasma)
data(plasma)
logplasma
A numeric vector of the logarithm of plasma levels.
betacaro
A numeric vector of dietary beta carotene consumed mcg per day.
bmi
A numeric vector of BMI values.
cholest
A numeric vector of cholesterol consumed mg per day.
dietfat
A numeric vector of the logarithm of grams of diet fat consumed per day.
fiber
A numeric vector of grams of fiber consumed per day.
retinol
A numeric vector of retinol consumed per day.
smoke
A numeric vector of smoking status (1=Never, 2=Former, 3=Current Smoker).
vituse
A numeric vector of vitamin use (1=Yes, fairly often, 2=Yes, not often, 3=No).
Nierenberg, D.,Stukel, T.,Baron, J.,Dain, B.,and Greenberg, E. (1989) Determinants of plasma levels of beta-carotene and retinol. American Journal of Epidemiology 130, 511–521.
data(plasma)
data(plasma)
Given an object of the cgam class, which has at least two non-parametrically modelled predictors, this routine will make a 3D plot of the fit with a set of two non-parametrically modelled predictors in the formula being the x and y labs. If there are more than two non-parametrically modelled predictors, any other such predictor will be evaluated at the largest value which is smaller than or equal to its median value.
If there is any categorical covariate and if the user specifies the argument categ to be a character representing a categorical covariate in the formula, then a 3D plot with multiple parallel surfaces, which represent the levels of a categorical covariate in an ascending order, will be created; otherwise, a 3D plot with only one surface will be created. Each level of a categorical covariate will be evaluated at its mode.
This routine is extended to make a 3D plot for an object fitted by warped-plane splines or triangle splines. Note that two non-parametrically modelled predictors specified in this routine must both be modelled as addtive components, or a pair of predictors forming an isotonic or convex surface without additivity assumption.
This routine is an extension of the generic R graphics routine persp. See the documentation below for more details.
plotpersp(object,...)
plotpersp(object,...)
object |
An object of the cgam class with at least two non-parametrically modelled predictors. |
... |
Arguments to be passed to the S3 method for the cgam class:
|
The routine plotpersp returns a 3D plot of an object of the cgam class. The lab and
lab represent a set of non-parametrically modelled predictors used in a cgam formula, and the
lab represents the estimated mean value or the estimated systematic component value.
Mary C. Meyer and Xiyue Liao
# Example 1. data(FEV) # extract the variables y <- FEV$FEV age <- FEV$age height <- FEV$height sex <- FEV$sex smoke <- FEV$smoke fit <- cgam(y ~ incr(age) + incr(height) + factor(sex) + factor(smoke), nsim = 0) fit.s <- cgam(y ~ s.incr(age) + s.incr(height) + factor(sex) + factor(smoke), nsim = 0) plotpersp(fit, age, height, ngrid = 10, main = "Cgam Increasing Fit", sub = "Categorical Variable: Sex", categ = "factor(sex)") plotpersp(fit.s, age, height, ngrid = 10, main = "Cgam Smooth Increasing Fit", sub = "Categorical Variable: Smoke", categ = "factor(smoke)") # Example 2. data(plasma) # extract the variables y <- plasma$logplasma bmi <- plasma$bmi logdietfat <- plasma$logdietfat cholest <- plasma$cholest fiber <- plasma$fiber betacaro <- plasma$betacaro retinol <- plasma$retinol smoke <- plasma$smoke vituse <- plasma$vituse fit <- cgam(y ~ s.decr(bmi) + s.decr(logdietfat) + s.decr(cholest) + s.incr(fiber) + s.incr(betacaro) + s.incr(retinol) + factor(smoke) + factor(vituse)) plotpersp(fit, bmi, logdietfat, ngrid = 15, th = 120, ylab = "log(dietfat)", zlab = "est mean of log(plasma)", main = "Cgam Fit with the Plasma Data Set", sub = "Categorical Variable: Vitamin Use", categ = "factor(vituse)") # Example 3. data(plasma) addl <- 1:314*0 + 1 addl[runif(314) < .3] <- 2 addl[runif(314) > .8] <- 4 addl[runif(314) > .8] <- 3 ans <- cgam(logplasma ~ s.incr(betacaro, 5) + s.decr(bmi) + s.decr(logdietfat) + as.factor(addl), data = plasma) plotpersp(ans, betacaro, logdietfat, th = 240, random = TRUE, categ = "as.factor(addl)", data = plasma) # Example 4. ## Not run: n <- 100 set.seed(123) x1 <- sort(runif(n)) x2 <- sort(runif(n)) y <- 4 * (x1 - x2) + rnorm(n, sd = .5) # regress y on x1 and x2 under the shape-restriction: "decreasing-increasing" # with a penalty term = .1 ans <- cgam(y ~ s.decr.incr(x1, x2), pen = .1) # plot the constrained surface plotpersp(ans) ## End(Not run)
# Example 1. data(FEV) # extract the variables y <- FEV$FEV age <- FEV$age height <- FEV$height sex <- FEV$sex smoke <- FEV$smoke fit <- cgam(y ~ incr(age) + incr(height) + factor(sex) + factor(smoke), nsim = 0) fit.s <- cgam(y ~ s.incr(age) + s.incr(height) + factor(sex) + factor(smoke), nsim = 0) plotpersp(fit, age, height, ngrid = 10, main = "Cgam Increasing Fit", sub = "Categorical Variable: Sex", categ = "factor(sex)") plotpersp(fit.s, age, height, ngrid = 10, main = "Cgam Smooth Increasing Fit", sub = "Categorical Variable: Smoke", categ = "factor(smoke)") # Example 2. data(plasma) # extract the variables y <- plasma$logplasma bmi <- plasma$bmi logdietfat <- plasma$logdietfat cholest <- plasma$cholest fiber <- plasma$fiber betacaro <- plasma$betacaro retinol <- plasma$retinol smoke <- plasma$smoke vituse <- plasma$vituse fit <- cgam(y ~ s.decr(bmi) + s.decr(logdietfat) + s.decr(cholest) + s.incr(fiber) + s.incr(betacaro) + s.incr(retinol) + factor(smoke) + factor(vituse)) plotpersp(fit, bmi, logdietfat, ngrid = 15, th = 120, ylab = "log(dietfat)", zlab = "est mean of log(plasma)", main = "Cgam Fit with the Plasma Data Set", sub = "Categorical Variable: Vitamin Use", categ = "factor(vituse)") # Example 3. data(plasma) addl <- 1:314*0 + 1 addl[runif(314) < .3] <- 2 addl[runif(314) > .8] <- 4 addl[runif(314) > .8] <- 3 ans <- cgam(logplasma ~ s.incr(betacaro, 5) + s.decr(bmi) + s.decr(logdietfat) + as.factor(addl), data = plasma) plotpersp(ans, betacaro, logdietfat, th = 240, random = TRUE, categ = "as.factor(addl)", data = plasma) # Example 4. ## Not run: n <- 100 set.seed(123) x1 <- sort(runif(n)) x2 <- sort(runif(n)) y <- 4 * (x1 - x2) + rnorm(n, sd = .5) # regress y on x1 and x2 under the shape-restriction: "decreasing-increasing" # with a penalty term = .1 ans <- cgam(y ~ s.decr.incr(x1, x2), pen = .1) # plot the constrained surface plotpersp(ans) ## End(Not run)
Predicted values based on a cgam object
## S3 method for class 'cgam' predict(object, newData, interval = c("none", "confidence", "prediction"), type = c("response", "link"), level = 0.95, n.mix = 500,...)
## S3 method for class 'cgam' predict(object, newData, interval = c("none", "confidence", "prediction"), type = c("response", "link"), level = 0.95, n.mix = 500,...)
object |
A cgam object. |
newData |
A data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
interval |
Type of interval calculation. A prediction interval is only implemented for Gaussian response for now. |
type |
If the response is Gaussian, type = "response" gives the predicted mean; if the response is binomial, type = "response" gives the predicted probabilities, and type = "link" gives the predicted systematic component. |
level |
Tolerance/confidence level. |
n.mix |
Number of simulations to get the mixture distribution. The default is n.mix = 500. |
... |
Further arguments passed to the routine. |
Constrained spline estimators can be characterized by projections onto a polyhedral convex cone. Point-wise confidence intervals for constrained splines are constructed by estimating the probabilities that the projection lands on each of the faces of the cone, and using a mixture of covariance matrices to estimate the standard error of the function estimator at any design point.
Note that currently predict.cgam only works when all components in a cgam formula are additive.
See references cited in this section for more details.
fit |
A vector of predictions. |
lower |
A vector of lower bound if interval is set to be "confidence". |
upper |
A vector of upper bound if interval is set to be "confidence". |
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2017) Constrained partial linear regression splines. Statistical Sinica in press.
Meyer, M. C. (2017) Confidence intervals for regression functions using constrained splines with application to estimation of tree height
Meyer, M. C. (2012) Constrained penalized splines. Canadian Journal of Statistics 40(1), 190–206.
Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.
# Example 1. # generate data n <- 100 set.seed(123) x <- runif(n) y <- 4*x^3 + rnorm(n) # regress y on x under the shape-restriction: "increasing-convex" fit <- cgam(y ~ s.incr.conv(x)) # make a data frame x0 <- seq(min(x), max(x), by = 0.05) new.Data <- data.frame(x = x0) # predict values in new.Data based on the cgam fit without a confidence interval pfit <- predict(fit, new.Data) # or pfit <- predict(fit, new.Data, interval = "none") # make a plot to check the prediction plot(x, y, main = "Predict Method for CGAM") lines(sort(x), (fitted(fit)[order(x)])) points(x0, pfit$fit, col = 2, pch = 20) # predict values in new.Data based on the cgam fit with a 95 percent confidence interval pfit <- predict(fit, new.Data, interval = "confidence", level = 0.95) # make a plot to check the prediction plot(x, y, main = "Pointwise Confidence Bands (Gaussian Response)") lines(sort(x), (fitted(fit)[order(x)])) lines(sort(x0), (pfit$lower)[order(x0)], col = 2, lty = 2) lines(sort(x0), (pfit$upper)[order(x0)], col = 2, lty = 2) points(x0, pfit$fit, col = 2, pch = 20) # Example 2. binomial response n <- 200 x <- seq(0, 1, length = n) eta <- 4*x - 2 mu <- exp(eta)/(1+exp(eta)) set.seed(123) y <- 1:n*0 y[runif(n)<mu] = 1 fit <- cgam(y ~ s.incr.conv(x), family = binomial) muhat <- fitted(fit) # predict values in new.Data based on the cgam fit with a 95 percent confidence interval xinterp <- seq(min(x), max(x), by = 0.05) new.Data <- data.frame(x = xinterp) pfit <- predict(fit, new.Data, interval = "confidence", level = 0.95) pmu <- pfit$fit lwr <- pfit$lower upp <- pfit$upper # make a plot to check the prediction plot(x, y, type = "n", ylim = c(0, 1), main = "Pointwise Confidence Bands (Binomial Response)") rug(x[y == 0]) rug(x[y == 1], side = 3) lines(x, mu) lines(x, muhat, col = 5, lty = 2) points(xinterp, pmu, pch = 20) lines(xinterp, upp, col = 5) points(xinterp, upp, pch = 20) lines(xinterp, lwr, col = 5) points(xinterp, lwr, pch = 20)
# Example 1. # generate data n <- 100 set.seed(123) x <- runif(n) y <- 4*x^3 + rnorm(n) # regress y on x under the shape-restriction: "increasing-convex" fit <- cgam(y ~ s.incr.conv(x)) # make a data frame x0 <- seq(min(x), max(x), by = 0.05) new.Data <- data.frame(x = x0) # predict values in new.Data based on the cgam fit without a confidence interval pfit <- predict(fit, new.Data) # or pfit <- predict(fit, new.Data, interval = "none") # make a plot to check the prediction plot(x, y, main = "Predict Method for CGAM") lines(sort(x), (fitted(fit)[order(x)])) points(x0, pfit$fit, col = 2, pch = 20) # predict values in new.Data based on the cgam fit with a 95 percent confidence interval pfit <- predict(fit, new.Data, interval = "confidence", level = 0.95) # make a plot to check the prediction plot(x, y, main = "Pointwise Confidence Bands (Gaussian Response)") lines(sort(x), (fitted(fit)[order(x)])) lines(sort(x0), (pfit$lower)[order(x0)], col = 2, lty = 2) lines(sort(x0), (pfit$upper)[order(x0)], col = 2, lty = 2) points(x0, pfit$fit, col = 2, pch = 20) # Example 2. binomial response n <- 200 x <- seq(0, 1, length = n) eta <- 4*x - 2 mu <- exp(eta)/(1+exp(eta)) set.seed(123) y <- 1:n*0 y[runif(n)<mu] = 1 fit <- cgam(y ~ s.incr.conv(x), family = binomial) muhat <- fitted(fit) # predict values in new.Data based on the cgam fit with a 95 percent confidence interval xinterp <- seq(min(x), max(x), by = 0.05) new.Data <- data.frame(x = xinterp) pfit <- predict(fit, new.Data, interval = "confidence", level = 0.95) pmu <- pfit$fit lwr <- pfit$lower upp <- pfit$upper # make a plot to check the prediction plot(x, y, type = "n", ylim = c(0, 1), main = "Pointwise Confidence Bands (Binomial Response)") rug(x[y == 0]) rug(x[y == 1], side = 3) lines(x, mu) lines(x, muhat, col = 5, lty = 2) points(xinterp, pmu, pch = 20) lines(xinterp, upp, col = 5) points(xinterp, upp, pch = 20) lines(xinterp, lwr, col = 5) points(xinterp, lwr, pch = 20)
A symbolic routine to define that the systematic component is smooth in a predictor in a formula argument to cgam. This is the smooth version.
s(x, numknots = 0, knots = 0, space = "Q")
s(x, numknots = 0, knots = 0, space = "Q")
x |
A numeric predictor which has the same length as the response vector. |
numknots |
The number of knots used to constrain |
knots |
The knots used to constrain |
space |
A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal |
"s" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s"; the shape attribute is 17("smooth"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by C-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.
Note that "s" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.
See references cited in this section for more details.
The vector x with five attributes, i.e., name: the name of x; shape: 17("smooth"); numknots: the numknots argument in "s"; knots: the knots argument in "s"; space: the space argument in "s".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.
s.incr
, s.decr
, s.conc
, s.conv
, s.incr.conc
, s.incr.conv
, s.decr.conc
, s.decr.conv
# generate y x <- seq(-1, 2, by = 0.1) n <- length(x) y <- - x^2 + rnorm(n, .3) # regress y on x under the shape-restriction: "smooth" ans <- cgam(y ~ s(x)) knots <- ans$knots[[1]] # make a plot plot(x, y) lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth fit", col = 2, lty = 1) legend(1.6, 1.8, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
# generate y x <- seq(-1, 2, by = 0.1) n <- length(x) y <- - x^2 + rnorm(n, .3) # regress y on x under the shape-restriction: "smooth" ans <- cgam(y ~ s(x)) knots <- ans$knots[[1]] # make a plot plot(x, y) lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth fit", col = 2, lty = 1) legend(1.6, 1.8, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
A symbolic routine to define that the systematic component is smooth and concave in a predictor in a formula argument to cgam. This is the smooth version.
s.conc(x, numknots = 0, knots = 0, space = "Q")
s.conc(x, numknots = 0, knots = 0, space = "Q")
x |
A numeric predictor which has the same length as the response vector. |
numknots |
The number of knots used to constrain |
knots |
The knots used to constrain |
space |
A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal |
"s.conc" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.conc"; the shape attribute is 12("smooth and concave"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by C-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.
Note that "s.conc" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.
See references cited in this section for more details.
The vector x with five attributes, i.e., name: the name of x; shape: 12("smooth and concave"); numknots: the numknots argument in "s.conc"; knots: the knots argument in "s.conc"; space: the space argument in "s.conc".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.
# generate y x <- seq(-1, 2, by = 0.1) n <- length(x) y <- - x^2 + rnorm(n, .3) # regress y on x under the shape-restriction: "smooth and concave" ans <- cgam(y ~ s.conc(x)) knots <- ans$knots[[1]] # make a plot plot(x, y) lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth and concave fit", col = 2, lty = 1) legend(1.6, 1.8, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
# generate y x <- seq(-1, 2, by = 0.1) n <- length(x) y <- - x^2 + rnorm(n, .3) # regress y on x under the shape-restriction: "smooth and concave" ans <- cgam(y ~ s.conc(x)) knots <- ans$knots[[1]] # make a plot plot(x, y) lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth and concave fit", col = 2, lty = 1) legend(1.6, 1.8, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
A symbolic routine to define that a surface is concave in two predictors in a formula argument to cgam.
s.conc.conc(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))
s.conc.conc(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))
x1 |
A numeric predictor which has the same length as the response vector. |
x2 |
A numeric predictor which has the same length as the response vector. |
numknots |
A vector of the number of knots used to constrain |
knots |
A list of two vectors of knots used to constrain |
space |
A vector of the character specifying the method to create knots for |
"s.conc.conc" returns the vectors "x1" and "x2", and imposes on each vector six attributes: name, shape, numknots, knots, space and cvs.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.conc.conc"; the shape attribute is "tri_cvs"(doubly-concave); the cvs values for both vectors are FALSE. According to the value of the vector itself and its shape, numknots, knots, space and cvs attributes, the cone edges will be made by triangle spline basis functions in Meyer (2017). The cone edges are a set of basis employed in the hinge algorithm.
Note that "s.conc.conc" does not make the corresponding cone edges itself. It sets things up to a subroutine called trispl.fit
See references cited in this section for more details.
The vectors and
. Each of them has six attributes, i.e., name: names of
and
; shape: "tri_cvs"(doubly-concave); numknots: the numknots argument in "s.conc.conc"; knots: the knots argument in "s.conc.conc"; space: the space argument in "s.conc.conc"; cvs: two logical values indicating the monotonicity of the isotonically-constrained surface with respect to
and
, which are both FALSE.
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2017) Estimation and inference for regression surfaces using shape-constrained splines.
# generate data n <- 200 set.seed(123) x1 <- runif(n); x2 <- runif(n) y <- -(x1 - 1)^2 - (x2 - 3)^2 + rnorm(n) # regress y on x1 and x2 under the shape-restriction: "doubly-concave" ans <- cgam(y ~ s.conc.conc(x1, x2), nsim = 0) # make a 3D plot of the constrained surface plotpersp(ans)
# generate data n <- 200 set.seed(123) x1 <- runif(n); x2 <- runif(n) y <- -(x1 - 1)^2 - (x2 - 3)^2 + rnorm(n) # regress y on x1 and x2 under the shape-restriction: "doubly-concave" ans <- cgam(y ~ s.conc.conc(x1, x2), nsim = 0) # make a 3D plot of the constrained surface plotpersp(ans)
A symbolic routine to define that the systematic component is smooth and convex in a predictor in a formula argument to cgam. This is the smooth version.
s.conv(x, numknots = 0, knots = 0, space = "Q")
s.conv(x, numknots = 0, knots = 0, space = "Q")
x |
A numeric predictor which has the same length as the response vector. |
numknots |
The number of knots used to constrain |
knots |
The knots used to constrain |
space |
A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal |
"s.conv" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.conv"; the shape attribute is 11("smooth and convex"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by C-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.
Note that "s.conv" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.
See references cited in this section for more details.
The vector x with five attributes, i.e., name: the name of x; shape: 11("smooth and convex"); numknots: the numknots argument in "s.conv"; knots: the knots argument in "s.conv"; space: the space argument in "s.conv".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.
# generate y x <- seq(-1, 2, by = 0.1) n <- length(x) y <- x^2 + rnorm(n, .3) # regress y on x under the shape-restriction: "smooth and convex" ans <- cgam(y ~ s.conv(x)) knots <- ans$knots[[1]] # make a plot plot(x, y) lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth and convex fit", col = 2, lty = 1) legend(1.6, -1, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
# generate y x <- seq(-1, 2, by = 0.1) n <- length(x) y <- x^2 + rnorm(n, .3) # regress y on x under the shape-restriction: "smooth and convex" ans <- cgam(y ~ s.conv(x)) knots <- ans$knots[[1]] # make a plot plot(x, y) lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth and convex fit", col = 2, lty = 1) legend(1.6, -1, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
A symbolic routine to define that a surface is convex in two predictors in a formula argument to cgam.
s.conv.conv(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))
s.conv.conv(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))
x1 |
A numeric predictor which has the same length as the response vector. |
x2 |
A numeric predictor which has the same length as the response vector. |
numknots |
A vector of the number of knots used to constrain |
knots |
A list of two vectors of knots used to constrain |
space |
A vector of the character specifying the method to create knots for |
"s.conv.conv" returns the vectors "x1" and "x2", and imposes on each vector six attributes: name, shape, numknots, knots, space and cvs.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.conv.conv"; the shape attribute is "tri_cvs"(doubly-convex); the cvs values for both vectors are TRUE. According to the value of the vector itself and its shape, numknots, knots, space and cvs attributes, the cone edges will be made by triangle spline basis functions in Meyer (2017). The cone edges are a set of basis employed in the hinge algorithm.
Note that "s.conv.conv" does not make the corresponding cone edges itself. It sets things up to a subroutine called trispl.fit
See references cited in this section for more details.
The vectors and
. Each of them has six attributes, i.e., name: names of
and
; shape: "tri_cvs"(doubly-convex); numknots: the numknots argument in "s.conv.conv"; knots: the knots argument in "s.conv.conv"; space: the space argument in "s.conv.conv"; cvs: two logical values indicating the monotonicity of the isotonically-constrained surface with respect to
and
, which are both TRUE.
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2017) Estimation and inference for regression surfaces using shape-constrained splines.
# generate data n <- 200 set.seed(123) x1 <- runif(n); x2 <- runif(n) y <- (x1 - 1)^2 + (x2 - 3)^2 + rnorm(n) # regress y on x1 and x2 under the shape-restriction: "doubly-convex" ans <- cgam(y ~ s.conv.conv(x1, x2), nsim = 0) # make a 3D plot of the constrained surface plotpersp(ans)
# generate data n <- 200 set.seed(123) x1 <- runif(n); x2 <- runif(n) y <- (x1 - 1)^2 + (x2 - 3)^2 + rnorm(n) # regress y on x1 and x2 under the shape-restriction: "doubly-convex" ans <- cgam(y ~ s.conv.conv(x1, x2), nsim = 0) # make a 3D plot of the constrained surface plotpersp(ans)
A symbolic routine to define that the systematic component is smooth and decreasing in a predictor in a formula argument to cgam. This is the smooth version.
s.decr(x, numknots = 0, knots = 0, var.knots = 0, space = "Q", db.exp = FALSE)
s.decr(x, numknots = 0, knots = 0, var.knots = 0, space = "Q", db.exp = FALSE)
x |
A numeric predictor which has the same length as the response vector. |
numknots |
The number of knots used to constrain |
knots |
The knots used to constrain |
var.knots |
The knots used in variance function estimation. User-defined knots will be used when given. The default is var.knots = |
space |
A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal |
db.exp |
The parameter will be used in variance function estimation. If db.exp = TRUE, then the errors are assumed to follow a normal distribution; otherwise, the errors are assumed to follow a double-exponential distribution. The default is db.exp = FALSE. |
"s.decr" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots, space, var.knots and db.exp.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.decr"; the shape attribute is 10("smooth and decreasing"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by I-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.
Note that "s.decr" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.
var.knots and db.exp will be used for monotonic variance function estimation.
See references cited in this section for more details.
The vector x with five attributes, i.e, name: the name of x; shape: 10("smooth and decreasing"); numknots: the numknots argument in "s.decr"; knots: the knots argument in "s.decr"; space: the space argument in "s.decr".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.
data(cubic) # extract x x <- - cubic$x # extract y y <- cubic$y # regress y on x under the shape-restriction: "smooth and decreasing" ans <- cgam(y ~ s.decr(x)) knots <- ans$knots[[1]] # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth and decreasing fit", col = 2, lty = 1) legend(-.3, 8, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
data(cubic) # extract x x <- - cubic$x # extract y y <- cubic$y # regress y on x under the shape-restriction: "smooth and decreasing" ans <- cgam(y ~ s.decr(x)) knots <- ans$knots[[1]] # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth and decreasing fit", col = 2, lty = 1) legend(-.3, 8, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
A symbolic routine to define that the systematic component is smooth, decreasing and concave in a predictor in a formula argument to cgam. This is the smooth version.
s.decr.conc(x, numknots = 0, knots = 0, space = "Q")
s.decr.conc(x, numknots = 0, knots = 0, space = "Q")
x |
A numeric predictor which has the same length as the response vector. |
numknots |
The number of knots used to constrain |
knots |
The knots used to constrain |
space |
A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal |
"s.decr.conc" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.decr.conc"; the shape attribute is 16("smooth, decreasing and concave"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by C-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.
Note that "s.decr.conc" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.
See references cited in this section for more details.
The vector x with five attributes, i.e., name: the name of x; shape: 16("smooth, decreasing and concave"); numknots: the numknots argument in "s.decr.conc"; knots: the knots argument in "s.decr.conc"; space: the space argument in "s.decr.conc".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.
data(cubic) # extract x x <- cubic$x # extract y y <- - cubic$y # regress y on x under the shape-restriction: "smooth, decreasing and concave" ans <- cgam(y ~ s.decr.conc(x)) knots <- ans$knots[[1]] # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth, decreasing and concave fit", col = 2, lty = 1) legend(1.7, 4, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
data(cubic) # extract x x <- cubic$x # extract y y <- - cubic$y # regress y on x under the shape-restriction: "smooth, decreasing and concave" ans <- cgam(y ~ s.decr.conc(x)) knots <- ans$knots[[1]] # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth, decreasing and concave fit", col = 2, lty = 1) legend(1.7, 4, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
A symbolic routine to define that the systematic component is smooth, decreasing and convex in a predictor in a formula argument to cgam. This is the smooth version.
s.decr.conv(x, numknots = 0, knots = 0, space = "Q")
s.decr.conv(x, numknots = 0, knots = 0, space = "Q")
x |
A numeric predictor which has the same length as the response vector. |
numknots |
The number of knots used to constrain |
knots |
The knots used to constrain |
space |
A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal |
"s.decr.conv" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.decr.conv"; the shape attribute is 15("smooth, decreasing and convex"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by C-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.
Note that "s.decr.conv" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.
See references cited in this section for more details.
The vector x with five attributes, i.e., name: the name of x; shape: 15("smooth, decreasing and convex"); numknots: the numknots argument in "s.decr.conv"; knots: the knots argument in "s.decr.conv"; space: the space argument in "s.decr.conv".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.
data(cubic) # extract x x <- - cubic$x # extract y y <- cubic$y # regress y on x under the shape-restriction: "smooth, decreasing and convex" ans <- cgam(y ~ s.decr.conv(x)) knots <- ans$knots[[1]] # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth, decreasing and convex fit", col = 2, lty = 1) legend(-.3, 9.2, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
data(cubic) # extract x x <- - cubic$x # extract y y <- cubic$y # regress y on x under the shape-restriction: "smooth, decreasing and convex" ans <- cgam(y ~ s.decr.conv(x)) knots <- ans$knots[[1]] # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth, decreasing and convex fit", col = 2, lty = 1) legend(-.3, 9.2, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
A symbolic routine to define that a surface is decreasing in two predictors in a formula argument to cgam.
s.decr.decr(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))
s.decr.decr(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))
x1 |
A numeric predictor which has the same length as the response vector. |
x2 |
A numeric predictor which has the same length as the response vector. |
numknots |
A vector of the number of knots used to constrain |
knots |
A list of two vectors of knots used to constrain |
space |
A vector of the character specifying the method to create knots for |
"s.decr.decr" returns the vectors "x1" and "x2", and imposes on each vector six attributes: name, shape, numknots, knots, space and decreasing.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.decr.decr"; the shape attribute is "wps_dd"(doubly-decreasing); the decreasing values for both vectors are TRUE. According to the value of the vector itself and its shape, numknots, knots, space and decreasing attributes, the cone edges will be made by warped-plane spline basis functions in Meyer (2016). The cone edges are a set of basis employed in the hinge algorithm.
Note that "s.decr.decr" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta_wps.
See references cited in this section for more details.
The vectors and
. Each of them has six attributes, i.e., name: names of
and
; shape: "wps_dd"(doubly-decreasing); numknots: the numknots argument in "s.decr.decr"; knots: the knots argument in "s.decr.decr"; space: the space argument in "s.decr.decr"; decreasing: two logical values indicating the monotonicity of the isotonically-constrained surface with respect to
and
, which are both TRUE.
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2017) Estimation and inference for regression surfaces using shape-constrained splines.
s.incr.incr
, s.decr.incr
, s.incr.decr
, cgam
## Not run: # generate data n <- 100 set.seed(123) x1 <- runif(n) x2 <- runif(n) y <- -4 * (x1 + x2 - x1 * x2) + rnorm(n, sd = .2) # regress y on x1 and x2 under the shape-restriction: "doubly-decreasing" # using the penalized estimator ans <- cgam(y ~ s.decr.decr(x1, x2), pnt = TRUE) # make a 3D plot of the constrained surface plotpersp(ans) ## End(Not run)
## Not run: # generate data n <- 100 set.seed(123) x1 <- runif(n) x2 <- runif(n) y <- -4 * (x1 + x2 - x1 * x2) + rnorm(n, sd = .2) # regress y on x1 and x2 under the shape-restriction: "doubly-decreasing" # using the penalized estimator ans <- cgam(y ~ s.decr.decr(x1, x2), pnt = TRUE) # make a 3D plot of the constrained surface plotpersp(ans) ## End(Not run)
A symbolic routine to define that a surface is decreasing in one predictor and increasing in another in a formula argument to cgam.
s.decr.incr(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))
s.decr.incr(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))
x1 |
A numeric predictor which has the same length as the response vector. |
x2 |
A numeric predictor which has the same length as the response vector. |
numknots |
A vector of the number of knots used to constrain |
knots |
A list of two vectors of knots used to constrain |
space |
A vector of the character specifying the method to create knots for |
"s.decr.incr" returns the vectors "x1" and "x2", and imposes on each vector six attributes: name, shape, numknots, knots, space and decreasing.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.decr.incr"; the shape attribute is "wps_di"(decreasing-increasing); the decreasing values for "x1" and "x2" are TRUE and FALSE. According to the value of the vector itself and its shape, numknots, knots, space and decreasing attributes, the cone edges will be made by warped-plane spline basis functions in Meyer (2016). The cone edges are a set of basis employed in the hinge algorithm.
Note that "s.decr.incr" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta_wps.
See references cited in this section for more details.
The vectors and
. Each of them has six attributes, i.e., name: names of
and
; shape: "wps_di"(decreasing-increasing); numknots: the numknots argument in "s.decr.incr"; knots: the knots argument in "s.decr.incr"; space: the space argument in "s.decr.incr"; decreasing: two logical values indicating the monotonicity of the isotonically-constrained surface with respect to
and
, which are TRUE and FALSE.
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2017) Estimation and inference for regression surfaces using shape-constrained splines.
s.incr.incr
, s.incr.decr
, s.decr.decr
, cgam
## Not run: # generate data n <- 100 set.seed(123) x1 <- runif(n) x2 <- runif(n) y <- 4 * (x2 - x1) - x1 * x2 + rnorm(n, sd = .2) # regress y on x1 and x2 under the shape-restriction: "decreasing-increasing" # using the penalized estimator ans <- cgam(y ~ s.decr.incr(x1, x2), pnt = TRUE) # make a 3D plot of the constrained surface plotpersp(ans) ## End(Not run)
## Not run: # generate data n <- 100 set.seed(123) x1 <- runif(n) x2 <- runif(n) y <- 4 * (x2 - x1) - x1 * x2 + rnorm(n, sd = .2) # regress y on x1 and x2 under the shape-restriction: "decreasing-increasing" # using the penalized estimator ans <- cgam(y ~ s.decr.incr(x1, x2), pnt = TRUE) # make a 3D plot of the constrained surface plotpersp(ans) ## End(Not run)
A symbolic routine to define that the systematic component is smooth and increasing in a predictor in a formula argument to cgam. This is the smooth version.
s.incr(x, numknots = 0, knots = 0, var.knots = 0, space = "Q", db.exp = FALSE)
s.incr(x, numknots = 0, knots = 0, var.knots = 0, space = "Q", db.exp = FALSE)
x |
A numeric predictor which has the same length as the response vector. |
numknots |
The number of knots used to constrain |
knots |
The knots used to constrain |
var.knots |
The knots used in variance function estimation. User-defined knots will be used when given. The default is var.knots = |
space |
A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal |
db.exp |
The parameter will be used in variance function estimation. If db.exp = TRUE, then the errors are assumed to follow a normal distribution; otherwise, the errors are assumed to follow a double-exponential distribution. The default is db.exp = FALSE. |
"s.incr" returns the vector "x" and imposes on it seven attributes: name, shape, numknots, knots, space, var.knots and db.exp.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.incr"; the shape attribute is 9("smooth and increasing"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by I-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.
Note that "s.incr" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.
var.knots and db.exp will be used for monotonic variance function estimation.
See references cited in this section for more details.
The vector x with five attributes, i.e., name: the name of x; shape: 9("smooth and increasing"); numknots: the numknots argument in "s.incr"; knots: the knots argument in "s.incr"; space: the space argument in "s.incr".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.
data(cubic) # extract x x <- cubic$x # extract y y <- cubic$y # regress y on x with the shape restriction: "smooth and increasing" ans <- cgam(y ~ s.incr(x)) knots <- ans$knots[[1]] # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth and increasing fit", col = 2, lty = 1) legend(1.7, 9.2, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
data(cubic) # extract x x <- cubic$x # extract y y <- cubic$y # regress y on x with the shape restriction: "smooth and increasing" ans <- cgam(y ~ s.incr(x)) knots <- ans$knots[[1]] # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth and increasing fit", col = 2, lty = 1) legend(1.7, 9.2, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
A symbolic routine to define that the systematic component is smooth, increasing and concave in a predictor in a formula argument to cgam. This is the smooth version.
s.incr.conc(x, numknots = 0, knots = 0, space = "Q")
s.incr.conc(x, numknots = 0, knots = 0, space = "Q")
x |
A numeric predictor which has the same length as the response vector. |
numknots |
The number of knots used to constrain |
knots |
The knots used to constrain |
space |
A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal |
"s.incr.conc" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.incr.conc"; the shape attribute is 14("smooth, increasing and concave"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by C-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.
Note that "s.incr.conc" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.
See references cited in this section for more details.
The vector x with five attributes, i.e., name: the name of x; shape: 14("smooth, increasing and concave"); numknots: the numknots argument in "s.incr.conc"; knots: the knots argument in "s.incr.conc"; space: the space argument in "s.incr.conc".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.
data(cubic) # extract x x <- - cubic$x # extract y y <- - cubic$y # regress y on x with the shape restriction: "smooth, increasing and concave" ans <- cgam(y ~ s.incr.conc(x)) knots <- ans$knots[[1]] # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth, increasing and concave fit", col = 2, lty = 1) legend(-.3, 4, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
data(cubic) # extract x x <- - cubic$x # extract y y <- - cubic$y # regress y on x with the shape restriction: "smooth, increasing and concave" ans <- cgam(y ~ s.incr.conc(x)) knots <- ans$knots[[1]] # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth, increasing and concave fit", col = 2, lty = 1) legend(-.3, 4, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
A symbolic routine to define that the systematic component is smooth, increasing and convex in a predictor in a formula argument to cgam. This is the smooth version.
s.incr.conv(x, numknots = 0, knots = 0, space = "Q")
s.incr.conv(x, numknots = 0, knots = 0, space = "Q")
x |
A numeric predictor which has the same length as the response vector. |
numknots |
The number of knots used to constrain |
knots |
The knots used to constrain |
space |
A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal |
"s.incr.conv" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.incr.conv"; the shape attribute is 13("smooth, increasing and convex"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by C-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.
Note that "s.incr.conv" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.
See references cited in this section for more details.
The vector x with five attributes, i.e., name: the name of x; shape: 13("smooth, increasing and convex"); numknots: the numknots argument in "s.incr.conv"; knots: the knots argument in "s.incr.conv"; space: the space argument in "s.incr.conv".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.
data(cubic) # extract x x <- cubic$x # extract y y <- cubic$y # regress y on x with the shape restriction: "smooth, increasing and convex" ans <- cgam(y ~ s.incr.conv(x)) knots <- ans$knots[[1]] # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth, increasing and convex fit", col = 2, lty = 1) legend(1.7, 9.2, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
data(cubic) # extract x x <- cubic$x # extract y y <- cubic$y # regress y on x with the shape restriction: "smooth, increasing and convex" ans <- cgam(y ~ s.incr.conv(x)) knots <- ans$knots[[1]] # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, xlab = "x", ylab = "y") lines(x, ans$muhat, col = 2) legend("topleft", bty = "n", "smooth, increasing and convex fit", col = 2, lty = 1) legend(1.7, 9.2, bty = "o", "knots", pch = "X") points(knots, 1:length(knots)*0+min(y), pch = "X")
A symbolic routine to define that a surface is decreasing in one predictor and increasing in another in a formula argument to cgam.
s.incr.decr(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))
s.incr.decr(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))
x1 |
A numeric predictor which has the same length as the response vector. |
x2 |
A numeric predictor which has the same length as the response vector. |
numknots |
A vector of the number of knots used to constrain |
knots |
A list of two vectors of knots used to constrain |
space |
A vector of the character specifying the method to create knots for |
"s.incr.decr" returns the vectors "x1" and "x2", and imposes on each vector six attributes: name, shape, numknots, knots, space and decreasing.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.incr.decr"; the shape attribute is "wps_id"(increasing-decreasing); the decreasing values for "x1" and "x2" are TRUE and FALSE. According to the value of the vector itself and its shape, numknots, knots, space and decreasing attributes, the cone edges will be made by warped-plane spline basis functions in Meyer (2016). The cone edges are a set of basis employed in the hinge algorithm.
Note that "s.incr.decr" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta_wps.
See references cited in this section for more details.
The vectors and
. Each of them has six attributes, i.e., name: names of
and
; shape: "wps_id"(increasing-decreasing); numknots: the numknots argument in "s.incr.decr"; knots: the knots argument in "s.incr.decr"; space: the space argument in "s.incr.decr"; decreasing: two logical values indicating the monotonicity of the isotonically-constrained surface with respect to
and
, which are FALSE and TRUE.
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2017) Estimation and inference for regression surfaces using shape-constrained splines.
s.incr.incr
, s.decr.decr
, s.decr.incr
, cgam
## Not run: # generate data n <- 100 set.seed(123) x1 <- runif(n) x2 <- runif(n) y <- 4 * (x1 - x2) - x1 * x2 + rnorm(n, sd = .2) # regress y on x1 and x2 under the shape-restriction: "increasing-decreasing" # using the penalized estimator ans <- cgam(y ~ s.incr.decr(x1, x2), pnt = TRUE) # make a 3D plot of the constrained surface plotpersp(ans) ## End(Not run)
## Not run: # generate data n <- 100 set.seed(123) x1 <- runif(n) x2 <- runif(n) y <- 4 * (x1 - x2) - x1 * x2 + rnorm(n, sd = .2) # regress y on x1 and x2 under the shape-restriction: "increasing-decreasing" # using the penalized estimator ans <- cgam(y ~ s.incr.decr(x1, x2), pnt = TRUE) # make a 3D plot of the constrained surface plotpersp(ans) ## End(Not run)
A symbolic routine to define that a surface is increasing in two predictors in a formula argument to cgam.
s.incr.incr(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))
s.incr.incr(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))
x1 |
A numeric predictor which has the same length as the response vector. |
x2 |
A numeric predictor which has the same length as the response vector. |
numknots |
A vector of the number of knots used to constrain |
knots |
A list of two vectors of knots used to constrain |
space |
A vector of the character specifying the method to create knots for |
"s.incr.incr" returns the vectors "x1" and "x2", and imposes on each vector six attributes: name, shape, numknots, knots, space and decreasing.
The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.incr.incr"; the shape attribute is "wps_ii"(doubly-increasing); the decreasing values for both vectors are FALSE. According to the value of the vector itself and its shape, numknots, knots, space and decreasing attributes, the cone edges will be made by warped-plane spline basis functions in Meyer (2016). The cone edges are a set of basis employed in the hinge algorithm.
Note that "s.incr.incr" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta_wps.
See references cited in this section for more details.
The vectors and
. Each of them has six attributes, i.e., name: names of
and
; shape: "wps_ii"(doubly-increasing); numknots: the numknots argument in "s.incr.incr"; knots: the knots argument in "s.incr.incr"; space: the space argument in "s.incr.incr"; decreasing: two logical values indicating the monotonicity of the isotonically-constrained surface with respect to
and
, which are both FALSE.
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2017) Estimation and inference for regression surfaces using shape-constrained splines.
s.decr.decr
, s.decr.incr
, cgam
## Not run: # generate data n <- 100 set.seed(123) x1 <- runif(n) x2 <- runif(n) y <- 4 * (x1 + x2 - x1 * x2) + rnorm(n, sd = .2) # regress y on x1 and x2 under the shape-restriction: "doubly-increasing" # using the penalized estimator ans <- cgam(y ~ s.incr.incr(x1, x2), pnt = TRUE) # make a 3D plot of the constrained surface plotpersp(ans) ## End(Not run)
## Not run: # generate data n <- 100 set.seed(123) x1 <- runif(n) x2 <- runif(n) y <- 4 * (x1 + x2 - x1 * x2) + rnorm(n, sd = .2) # regress y on x1 and x2 under the shape-restriction: "doubly-increasing" # using the penalized estimator ans <- cgam(y ~ s.incr.incr(x1, x2), pnt = TRUE) # make a 3D plot of the constrained surface plotpersp(ans) ## End(Not run)
A symbolic routine to indicate that a predictor is included as a non-parametrically modelled predictor in a formula argument to ShapeSelect.
shapes(x, set = "s.9")
shapes(x, set = "s.9")
x |
A numeric predictor which has the same length as the response vector. |
set |
A character or a numeric vector indicating all possible shapes defined for x. For example, we are not only interested in modeling the relationship between the growth of an organism (dependent variable To be more specific, the user can choose to specify this argument as following
|
The default is set = "s.9".
The vector x with three attributes, i.e., nm: the name of x; shape: a numeric vector ranging from 0 to 16 to indicate possible shapes imposed on the relationship between the response and x; type: "nparam", i.e., x is non-parametrically modelled.
Xiyue Liao
## Not run: # Example 1. n <- 100 # generate predictors, x is non-parametrically modelled # and z is parametrically modelled x <- runif(n) z <- rep(0:1, 50) # E(y) is generated as correlated to both x and z # the relationship between E(y) and x is smoothly increasing-convex y <- x^2 + 2 * I(z == 1) + rnorm(n, sd = 1) # call ShapeSelect to find the best model by the genetic algorithm fit <- ShapeSelect(y ~ shapes(x) + in.or.out(factor(z)), genetic = TRUE) # Example 2. n <- 100 z <- rep(c("A","B"), n / 2) x <- runif(n) # y0 is generated as correlated to z with a tree-ordering in it # y0 is smoothly increasing-convex in x y0 <- x^2 + I(z == "B") * 1.5 y <- y0 + rnorm(n, 1) fit <- ShapeSelect(y ~ s.incr(x) + shapes(z, set = "tree"), genetic = FALSE) # check the best fit in terms of z fit$top ## End(Not run)
## Not run: # Example 1. n <- 100 # generate predictors, x is non-parametrically modelled # and z is parametrically modelled x <- runif(n) z <- rep(0:1, 50) # E(y) is generated as correlated to both x and z # the relationship between E(y) and x is smoothly increasing-convex y <- x^2 + 2 * I(z == 1) + rnorm(n, sd = 1) # call ShapeSelect to find the best model by the genetic algorithm fit <- ShapeSelect(y ~ shapes(x) + in.or.out(factor(z)), genetic = TRUE) # Example 2. n <- 100 z <- rep(c("A","B"), n / 2) x <- runif(n) # y0 is generated as correlated to z with a tree-ordering in it # y0 is smoothly increasing-convex in x y0 <- x^2 + I(z == "B") * 1.5 y <- y0 + rnorm(n, 1) fit <- ShapeSelect(y ~ s.incr(x) + shapes(z, set = "tree"), genetic = FALSE) # check the best fit in terms of z fit$top ## End(Not run)
The partial linear generalized additive model is considered, where the goal is to choose a subset of predictor variables and describe the component relationships with the response, in the case where there is very little a priori information. For each predictor, the user need only specify a set of possible shape or order restrictions. A model selection method chooses the shapes and orderings of the relationships as well as the variables. For each possible combination of shapes and orders for the predictors, the maximum likelihood estimator for the constrained generalized additive model is found using iteratively re-weighted cone projections. The cone information criterion is used to select the best combination of variables and shapes.
ShapeSelect(formula, family = gaussian, cpar = 2, data = NULL, weights = NULL, npop = 200, per.mutate = 0.05, genetic = FALSE)
ShapeSelect(formula, family = gaussian, cpar = 2, data = NULL, weights = NULL, npop = 200, per.mutate = 0.05, genetic = FALSE)
formula |
A formula object which includes a set of predictors to be selected. It has the form "response ~ predictor". The response is a vector of length |
family |
A parameter indicating the error distribution and link function to be used in the model. It can be a character string naming a family function or the result of a call to a family function. This is borrowed from the glm routine in the stats package. There are three families used in ShapeSelect: gaussian, binomial and poisson. |
cpar |
A multiplier to estimate the model variance, which is defined as |
data |
An optional data frame, list or environment containing the variables in the model. The default is data = NULL. |
weights |
An optional non-negative vector of "replicate weights" which has the same length as the response vector. If weights are not given, all weights are taken to equal 1. The default is weights = NULL. |
npop |
The population size used for the genetic algorithm. The default is npop = 200. |
per.mutate |
The percentage of mutation used for the genetic algorithm. The default is per.mutate = 0.05 |
genetic |
A logical scalar showing if or not the genetic algorithm is defined by the user to select the best model. The default is genetic = FALSE. |
Note that when the argument genetic is set to be FALSE, the routine will check to see if using the genetic algorithm is better than going through all models to find the best fit. The primary concern is running time. An interactive dialogue window may pop out to ask the user if they prefer to the genetic algorithm when it may take too long to do a brutal search, and if there are too many possible models to go through, like more than one million, the routine will implement the genetic algorithm anyway.
See references cited in this section for more details.
top |
The best model of the final population, which shows the variables chosen along with their best shapes. |
pop |
The final population ordered by its fitness values. It is a data frame, and each row of this data frame shows the shapes chosen for predictors in an individual model. Besides, the fitness value for each individual model is included in the last column of the data frame. For example, we have two continuous predictors |
fitness |
The sorted fitness values for the final population. |
tm |
Total cpu running time. |
xnms |
A vector storing the name of the nonparametrically-modelled predictor in a ShapeSelect formula. |
znms |
A vector storing the name of the parametrically-modelled predictor in a ShapeSelect formula, which is a categorical predictor or a linear term. |
trnms |
A vector storing the name of the treatment predictor in a ShapeSelect formula, which has three possible levels: no effect, tree ordering, unordered. |
zfacs |
A logical vector keeping track of if the parametrically-modelled predictor in a ShapeSelect formula is a categorical predictor or a linear term. |
mxf |
A vector keeping track of the largest fitness value in each generation. |
mnf |
A vector keeping track of the mean fitness value in each generation. |
GA |
A logical scalar showing if or not the genetic algorithm is actually implemented to select the best model. |
best.fit |
The best model fitted by the cgam routine, given the best variables with their shape constraints chosen by the ShapeSelect routine. |
call |
The matched call. |
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013a) Semi-parametric additive constrained regression. Journal of Nonparametric Statistics 25(3), 715
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
Meyer, M. C. and M. Woodroofe (2000) On the degrees of freedom in shape-restricted regression. Annals of Statistics 28, 1083–1104.
Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.
Meyer, M. C. and Liao, X. (2016) Variable and shape selection for the generalized additive model. under preparation
## Not run: # Example 1. library(MASS) data(Rubber) # ShapeSelect can be used to go through all models to find the best model fit <- ShapeSelect(loss ~ shapes(hard, set = "s.9") + shapes(tens, set = "s.9"), data = Rubber, genetic = FALSE) # the user can also choose to find the best model by the genetic algorithm # given any total number of possible models fit <- ShapeSelect(loss ~ shapes(hard, set = "s.9") + shapes(tens, set = "s.9"), data = Rubber, genetic = TRUE) # check the best model fit$top # check the running time fit$tm # Example 2. # simulate a data set such that the mean is smoothly increasing-convex in x1 and x2 n <- 100 x1 <- runif(n) x2 <- runif(n) y0 <- x1^2 + x2 + x2^3 z <- rep(0:1, 50) for (i in 1:n) { if (z[i] == 1) y0[i] = y0[i] * 1.5 } # add some random errors and call the routine y <- y0 + rnorm(n) # include factor(z) in the formula and determine if factor(z) should be in the model fit <- ShapeSelect(y ~ shapes(x1, set = "s.9") + shapes(x2, set = "s.9") + in.or.out(factor(z))) # use the genetic algorithm fit <- ShapeSelect(y ~ shapes(x1, set = "s.9") + shapes(x2, set = "s.9") + in.or.out(factor(z)), npop = 300, per.mutate=0.02) # include z as a linear term in the formula and determine if z should be in the model fit <- ShapeSelect(y ~ shapes(x1, set = "s.9") + shapes(x2, set = "s.9") + in.or.out(z)) # include z as a linear term in the model fit <- ShapeSelect(y ~ shapes(x1, set = "s.9") + shapes(x2, set = "s.9") + z) # include factor(z) in the model fit <- ShapeSelect(y ~ shapes(x1, set = "s.9") + shapes(x2, set = "s.9") + factor(z)) # check the best model bf <- fit$best.fit # make a 3D plot of the best fit plotpersp(bf, categ = "z") ## End(Not run)
## Not run: # Example 1. library(MASS) data(Rubber) # ShapeSelect can be used to go through all models to find the best model fit <- ShapeSelect(loss ~ shapes(hard, set = "s.9") + shapes(tens, set = "s.9"), data = Rubber, genetic = FALSE) # the user can also choose to find the best model by the genetic algorithm # given any total number of possible models fit <- ShapeSelect(loss ~ shapes(hard, set = "s.9") + shapes(tens, set = "s.9"), data = Rubber, genetic = TRUE) # check the best model fit$top # check the running time fit$tm # Example 2. # simulate a data set such that the mean is smoothly increasing-convex in x1 and x2 n <- 100 x1 <- runif(n) x2 <- runif(n) y0 <- x1^2 + x2 + x2^3 z <- rep(0:1, 50) for (i in 1:n) { if (z[i] == 1) y0[i] = y0[i] * 1.5 } # add some random errors and call the routine y <- y0 + rnorm(n) # include factor(z) in the formula and determine if factor(z) should be in the model fit <- ShapeSelect(y ~ shapes(x1, set = "s.9") + shapes(x2, set = "s.9") + in.or.out(factor(z))) # use the genetic algorithm fit <- ShapeSelect(y ~ shapes(x1, set = "s.9") + shapes(x2, set = "s.9") + in.or.out(factor(z)), npop = 300, per.mutate=0.02) # include z as a linear term in the formula and determine if z should be in the model fit <- ShapeSelect(y ~ shapes(x1, set = "s.9") + shapes(x2, set = "s.9") + in.or.out(z)) # include z as a linear term in the model fit <- ShapeSelect(y ~ shapes(x1, set = "s.9") + shapes(x2, set = "s.9") + z) # include factor(z) in the model fit <- ShapeSelect(y ~ shapes(x1, set = "s.9") + shapes(x2, set = "s.9") + factor(z)) # check the best model bf <- fit$best.fit # make a 3D plot of the best fit plotpersp(bf, categ = "z") ## End(Not run)
A symbolic routine to define that the systematic component has a tree-ordering in a predictor in a formula argument to cgam.
tree(x, pl = NULL)
tree(x, pl = NULL)
x |
A numeric vector which has the same length as the response vector. Note that the placebo level of x must be 0. |
pl |
The placebo level. |
"tree" returns the vector "x" and imposes on it two attributes: name and shape.
The name attribute is used in the subroutine plotpersp; the shape attribute is "tree", and according to the value of the vector itself and its shape attribute, the cone edges of the cone generated by the constraint matrix, which constrains that has a tree-ordering in "x" will be made. The cone edges are a set of basis employed in the hinge algorithm.
Note that "tree" does not make the corresponding cone edges itself. It sets things up to a sub-routine called tree.fun in cgam which will make the cone edges. A tree-ordering is a partial ordering: For a categorical variable , if there are treatment levels
, where
is a placebo, we compare
with
, and not have any other comparable pairs.
See references cited in this section for more details.
The vector x with two attributes, i.e., name: the name of x; shape: "tree".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
# generate y set.seed(123) n <- 100 x <- rep(0:4, each = 20) z <- rep(c("a", "b"), 50) y <- x + I(z == "a") + rnorm(n, 1) xu <- unique(x) # regress y on x under the tree-ordering restriction fit.tree <- cgam(y ~ tree(x) + factor(z)) # make a plot plot(x, y, cex = .7) mua = unique(fit.tree$muhat)[unique(z) == "a"] points(xu, unique(fit.tree$muhat)[unique(z) == "a"], pch = '+', col = 4, cex = 3) legend(0,7.5, bty = "n", "tree-ordering fit: z = 'a'", col = 4, pch = '+', cex = 1.3) mub = unique(fit.tree$muhat)[unique(z) == "b"] points(xu, unique(fit.tree$muhat)[unique(z) == "b"], pch = '+', col = 2, cex = 3) legend(0,8.5, bty = "n", "tree-ordering fit: z = 'b'", col = 2, pch = '+', cex = 1.3)
# generate y set.seed(123) n <- 100 x <- rep(0:4, each = 20) z <- rep(c("a", "b"), 50) y <- x + I(z == "a") + rnorm(n, 1) xu <- unique(x) # regress y on x under the tree-ordering restriction fit.tree <- cgam(y ~ tree(x) + factor(z)) # make a plot plot(x, y, cex = .7) mua = unique(fit.tree$muhat)[unique(z) == "a"] points(xu, unique(fit.tree$muhat)[unique(z) == "a"], pch = '+', col = 4, cex = 3) legend(0,7.5, bty = "n", "tree-ordering fit: z = 'a'", col = 4, pch = '+', cex = 1.3) mub = unique(fit.tree$muhat)[unique(z) == "b"] points(xu, unique(fit.tree$muhat)[unique(z) == "b"], pch = '+', col = 2, cex = 3) legend(0,8.5, bty = "n", "tree-ordering fit: z = 'b'", col = 2, pch = '+', cex = 1.3)
A symbolic routine to define that the systematic component has an umbrella-ordering in a predictor in a formula argument to cgam.
umbrella(x)
umbrella(x)
x |
A numeric vector which has the same length as the response vector. |
"umbrella" returns the vector "x" and imposes on it two attributes: name and shape.
The name attribute is used in the subroutine plotpersp; the shape attribute is "umbrella", and to the value of the vector itself and its shape attribute, the cone edges of the cone generated by the constraint matrix, which constrains that has an umbrella-ordering in "x" will be made. The cone edges are a set of basis employed in the hinge algorithm.
Note that "umbrella" does not make the corresponding cone edges itself. It sets things up to a sub-routine called umbrella.fun in cgam which will make the cone edges. An umbrella-ordering is a partial ordering: Suppose we have a that is known to be a "mode" so that for
, we have a binary relation between
and
if
and for
we have the opposite binary relation if
, but if
and
, there is no such binary relation.
See references cited in this section for more details.
The vector x with two attributes, i.e., name: the name of x; shape: "umbrella".
Mary C. Meyer and Xiyue Liao
Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.
# generate y set.seed(123) n <- 20 x <- seq(-2, 2, length = n) y <- - x^2 + rnorm(n) # regress y on x under the umbrella-ordering restriction fit <- cgam(y ~ umbrella(x)) # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, ylab = "y") lines(x, fit$muhat, col = 2) legend("topleft", bty = "n", "umbrella-ordering fit", col = 2, lty = 1)
# generate y set.seed(123) n <- 20 x <- seq(-2, 2, length = n) y <- - x^2 + rnorm(n) # regress y on x under the umbrella-ordering restriction fit <- cgam(y ~ umbrella(x)) # make a plot par(mar = c(4, 4, 1, 1)) plot(x, y, cex = .7, ylab = "y") lines(x, fit$muhat, col = 2) legend("topleft", bty = "n", "umbrella-ordering fit", col = 2, lty = 1)