/ Home
S-Archive Download Script
reglm Generalized linear models with random effects
Fits a generalized linear model including random factors using the method of Schall (1991). Produces an object of class "glm" with some additional components.
reglm(formula = formula(data), random = NULL, family = gaussian(), data = sys.parent(), weights = NULL, dispersion = NULL, subset = NULL, na.action = na.fail, contrasts = NULL, start = NULL, mustart = NULL, betastart = NULL, sigmastart = NULL, control = reglm.control(...), ykeep = T, xkeep = F, ukeep = F, ...)
formula a formula expression as for glm, of the form response ~ predictors. This specifies the fixed part of the model. See the documentation of lm and formula for details. As for glm, this specifies the linear predictor for modelling the mean. A term of the form offset(expression) is allowed.
Almost all arguments for reglm are as for glm. The following arguments are new or altered. The first is the important one. The others are likely to be of only occasional use.
random a list object, the components of which are the factors to be treated as random. This specifies the random part of the model.
dispersion specifies a prior value for the dispersion of the glm. This argument defaults to 1 for the binomial and Poisson families. Same as the $scale command in GLIM.
mustart numeric vector giving starting values for the fitted values or expected responses (including the predicted values of the random effects). This is the same as the usual glm argument start except on the scale of the means rather than of the linear predictors. Must be of the same length as the response, or of length 1 if a constant starting vector is desired. Ignored if betastart is supplied.
betastart numeric vector giving starting values for the regression coefficients in the link-linear model for the mean (including predicted values of the random effects).
sigmastart numeric vector giving starting values for the variance components
control a list of iteration and algorithmic constants. See reglm.control for their names and default values. These can also be set as arguments to dglm itself. This argument is the same as control.glm except that the defaults have been changed.
ykeep logical flag: if TRUE, the vector of responses is returned. Same as y for glm.
xkeep logical flag: if TRUE, the model.matrix for the mean model is returned. Same as x for glm.
zkeep logical flag: if TRUE, the model.matrix for the variance components is returned.
an object of class glm is returned. The following components are additional to the usual glm components:
sigma vector of estimated variance components.
dispersion estimated dispersion. If the family had a prior dispersion, then the dispersion is estimated only at convergence of the iteration and returned to diagnose over- or under-dispersion.
prior.dispersion prior value of the dispersion for the glm. Same as the input argument dispersion if that was specified.
q vector of dimensions of the random factors..
edf vector of effective degrees of freedom used to estimate the variance components.
u design matrix for the random factors, if ukeep=T on input.
We assume the linear predictor for the generalized linear model is of the form

g(m) = Xb + U1b1 +  ... + Ucbc

where b is a vector of fixed coefficients and b1 ... bc are vectors of random effects. The random effects are assumed to have variances s1 ... sc. The design matrix X   is specified using the model formula and the design matrices U1 ... Uc are specified using the input argument random.

The parameters are estimated here using the REML method described in Schall (1991).

Schall, R (1991). Estimation in generalized linear models with random effects. Biometrika, 78, 719-727.

Candy, S. G. (2004). Modelling catch and effort data using generalized linear models, the Tweedie distribution, random vessel effects and random stratum-by-year effects. CCAMLR Science 11, 59-80.
Thanks for Dr Steve Candy, Australian Antarctic Division, for contributing a bug fix, 6 Nov 2004.
The following program is included in the reglm software distribution, but is not separately documented: reglm.control.
On output the regression parameters don't have labels.
Here we demonstrate the reglm function on the radiation of cancer cells data example from Schall (1991). Basic analysis of the data shows that there is strong evidence for differences between the occasions, and also good evidence for overdispersion between the dishes (observations) within occasions.
> radiatio <- read.table("radatio.txt",header=T)
> attach(radiatio)
> n <- rep(400,27)
We specify a binomial model with two random effects. The factor Occasion represents the radiation occasion. There is also a random effect at the observation level, represented by (1:27) since there are 27 observations, which picks up any over-dispersion in the data. Trace is turned on so that we can see the progress of the iteration. The variance components and dispersion are output at each iteration.
> out <- reglm(Survived/n~1,list(Occasion=Occasion,Units=(1:27)),
Iter 1 0.0000505604239786021 0.0000166989428102248 1
Iter 2 0.00283092775872843 0.000306570522656396 1
Iter 3 0.0914639350937484 0.00247127680331739 1
Iter 4 0.207417308402352 0.00381847728525275 1
Iter 5 0.222886503891888 0.00539847167615311 1
Iter 6 0.223151486991139 0.00690093186214274 1
Iter 7 0.222683195620167 0.00805926586181473 1
Iter 8 0.222303061660932 0.00882599204674674 1
Iter 9 0.222049310739124 0.00928435931728272 1
Iter 10 0.22189665203915 0.0095420092039735 1
Iter 11 0.221810447442714 0.00968186587710608 1
Iter 12 0.221763517700708 0.0097563502955682 1
Iter 13 0.221738481620298 0.00979561756518506 1
Iter 14 0.221725270519245 0.00981620792679515 1
Iter 15 0.221718339587564 0.00982697440004381 1
Iter 16 0.221714714509054 0.00983259577809794 1
Iter 17 0.221712821520788 0.00983552854676926 1
> out$sigma
[1] 0.221712822 0.009835529
> out$dispersion
[1] 0.9369549
The final variance components, and residual variance estimate, are the same as in Schall (1991). Note that the prior dispersion in this case is 1.
> out$q
[1]  9 27
> out$edf
[1] 7.744181 8.232174
The random effects have dimensions 9 and 27 respectively. After smoothing the random effects, the effective degrees of freedom are 7.7 and 8.2 respectively. Including Occasion as a fixed effect would have used 8 df. In this case the effect of Occasion is so strong that there is very little difference between the fixed and random models for it.
> out$rank-sum(out$q)+sum(out$edf)
[1] 16.97635
out$rank gives the total number of parameters in the linear predictor including both the fixed and random parameters. Therefore out$rank-sum(out$q) gives the number of fixed parameters, in this case 1. The total effective degrees of freedom used in modelling the means is just under 17.0, including the 1 degree of freedom for the fixed model.
> coef(out)[2:10]
[1]  0.69126760 -0.18631757 -0.70870278 -0.01694652  0.19448300 -0.02442938
[7]  0.60688086 -0.54311626 -0.01311896
> contrasts(Occasion) <- contr.sum(levels(Occasion))
> out.fixed <- glm(Survived/n~Occasion,family=binomial,weights=n)
> coef(out.fixed)[2:9]
 Occasion1  Occasion2  Occasion3   Occasion4 Occasion5   Occasion6
 0.7128812 -0.1915751 -0.7354824 -0.01624658 0.2027526 -0.02395969
 Occasion7  Occasion8
 0.6260501 -0.5620218
Note that the estimated coefficients for Occasion (there are 9 of them and they add to zero) are similar to the equivalent parameters from the fixed model, but shrunk somewhat towards zero. In the fixed model, the last coefficient is not included explicitly because the coefficients are constrained to add to zero.


S-Archive Download Script

Gordon Smyth. Copyright © 1996-2016. Last modified: 10 February 2004