/ Home |
Keywords: underdispersion relative to Poisson distribution, factorial experiment, dispersion modelling.
This data comes from an experiment on induction of flowering of cyclamen. Plants of 4 varieties of cyclamen were subject to a combination of 6 temperature regimens and 4 levels of fertilization. The temperature regimens are combinations of five temperatures during the day (14, 16, 18, 20 and 26 degrees C) and four temperatures during the night (14, 16, 18 and 20 C). Not all the combinations of temperatures are present. The response is the number of flowers, which vary from 4 to 26, with mode 8.
Variable | Description | ||
Variety | Variety of cyclamen | ||
Regimem | Temperature regimen (combination of the temperature during the day and the temperature during the night) | ||
Day | Temperature during the day (Centigrade) | ||
Night | Temperature during the night | ||
Fertilizer | Level of fertilization | ||
Flowers | Number of flowers | ||
Data File (tab-delimited text)
The data were supplied by Rodrigo Labouriau of the Biometrics Research Unit, Danish Institute for Agricultural Sciences. |
The flowers counts look distinctly under-dispersed relative to the Poisson distribution. There appear to be two few small counts, so that the counts are right skew relative to the Poisson distribution:
> cyclamen <- na.omit(read.table("cyclamen.txt",header=T)) > cyclamen$Variety <- factor(cyclamen$Variety) > cyclamen$Regimem <- factor(cyclamen$Regimem) > cyclamen$Fertilizer <- factor(cyclamen$Fertilizer) > attach(cyclamen) > cyclamen <- na.omit(cyclamen) > table(Day,Night) 14 16 18 20 26 14 320 0 0 320 319 16 0 320 0 0 0 18 0 0 319 0 0 20 0 0 0 320 0 > tapply(Regimem,list(Day,Night),mean) 14 16 18 20 26 14 1 NA NA 5 6 16 NA 2 NA NA NA 18 NA NA 3 NA NA 20 NA NA NA 4 NA > table(Flowers) 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 21 26 16 107 233 324 348 259 245 153 107 57 29 22 9 3 1 3 1 1 > out <- glm(Flowers~(Variety+Regimem+Fertilizer)^2,family=poisson) > anova(out,test="Chi") Analysis of Deviance Table Poisson model Response: Flowers Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev Pr(Chi) NULL 1917 1256.219 Variety 3 26.6900 1914 1229.529 0.0000068 Regimem 5 121.3986 1909 1108.131 0.0000000 Fertilizer 3 1.8289 1906 1106.302 0.6086692 Variety:Regimem 15 20.4549 1891 1085.847 0.1551692 Variety:Fertilizer 9 6.3119 1882 1079.535 0.7083428 Regimem:Fertilizer 15 4.6705 1867 1074.865 0.9945677 > out <- glm(Flowers~Variety+Regimem,family=poisson) > r <- qres.pois(out) > qqnorm(r,ylab="Quantile Residuals") > abline(0,1)>
Can try to estimate a Tweedie power family variance. The MLE estimate for the variance power is 2.46 while the REML estimate is virtually the same at 2.47.
> m2loglik function(x) { dglm(Flowers ~ Variety + Regimem, family = tweedie(var.power = x, link.power = 0))$m2loglik } > nlmin(m2loglik,1) $x: [1] 2.462306 $converged: [1] T $conv.type: [1] "relative function convergence" > m2loglik(1) [1] 8496.399 > m2loglik(1.99999) [1] 8380.303 > m2loglik(2.46) [1] 8371.907 > m2loglik(3) [1] 8383.227 > coxreid function(x) { dglm(Flowers ~ Variety + Regimem, family = tweedie(var.power = x, link.power = 0), method = "reml")$m2loglik } > nlmin(coxreid,2.46) $x: [1] 2.465982 $converged: [1] T $conv.type: [1] "relative function convergence"
There is a dispersion effect for Variety. Varieties 3 and 4 have dispersion about 1/3 larger than the other two varieties.
> contrasts(Variety) <- contr.treatment(levels(Variety)) > contrasts(Regimem) <- contr.treatment(levels(Regimem)) > out <- dglm(Flowers~Variety+Regimem,~Variety,family=tweedie(var.power=2.47,link.power=0)) > summary(out) Call: dglm(formula = Flowers ~ Variety + Regimem, dformula = ~ Variety, family = twe edie( var.power = 2.47, link.power = 0)) Mean Coefficients: Value Std. Error t value (Intercept) 2.30746315559 0.01771684 130.241233799 Variety2 -0.09690147202 0.01569619 -6.173565921 Variety3 -0.00006442688 0.01700265 -0.003789225 Variety4 -0.04018161723 0.01688345 -2.379941069 Regimem2 -0.04336368856 0.02084727 -2.080065615 Regimem3 -0.16314064162 0.02057673 -7.928406427 Regimem4 -0.19195936859 0.02049633 -9.365549669 Regimem5 -0.07916040725 0.02076102 -3.812933519 Regimem6 -0.24318730741 0.02039299 -11.925042558 (Dispersion Parameters for Tweedie family estimated as below ) Scaled Null Deviance: 2180.945 on 1917 degrees of freedom Scaled Residual Deviance: 1918 on 1909 degrees of freedom Dispersion Coefficients: Value Std. Error t value (Intercept) -3.81004656 0.06454972 -59.0249875 Variety2 -0.04703883 0.09128709 -0.5152846 Variety3 0.21758630 0.09128709 2.3835385 Variety4 0.20655446 0.09138253 2.2603276 (Dispersion Parameter for Gamma family taken to be 2 ) Scaled Null Deviance: 2323.196 on 1917 degrees of freedom Scaled Residual Deviance: 2309.667 on 1914 degrees of freedom Minus Twice the Log-Likelihood: 8358.401 Number of Alternating Iterations: 3 > tapply(fitted(out$disp),Variety,mean) 1 2 3 4 0.02214715 0.02112949 0.02753051 0.02722847 > anova(out) Analysis of Deviance Table Tweedie double generalized linear model Response: Flowers DF Seq.Chisq P Adj.Chisq P1 Mean model 8 237.8387 0.000000000 244.6100 0.000000000 Dispersion model 3 13.5079 0.003657629 13.5079 0.003657629 > qqnorm(residuals(out)) > qqline(residuals(out))
> m2loglik function(x) { dglm(Flowers ~ Variety + Regimem, ~ Variety, family = tweedie(var.power = x, link.power = 0))$m2loglik } > nlmin(m2loglik,2.46) $x: [1] 2.441391 $converged: [1] T $conv.type: [1] "relative function convergence"
The following is the empirical lambda-gram for the data. The four values at lambda=100 are actually infinite. The data looks a lot like a Poisson with the first four values removed.
Home - About Us -
Contact Us Copyright © Gordon Smyth |