OzDASL: Number of Cyclamen Flowers

Keywords: underdispersion relative to Poisson distribution, factorial experiment, dispersion modelling.

Description

This data comes from an experiment on induction of flowering of cyclamen. Plants of 4 varieties of cyclamen were subject to a combination of 6 temperature regimens and 4 levels of fertilization. The temperature regimens are combinations of five temperatures during the day (14, 16, 18, 20 and 26 degrees C) and four temperatures during the night (14, 16, 18 and 20 C). Not all the combinations of temperatures are present. The response is the number of flowers, which vary from 4 to 26, with mode 8.

Download

Source

Analysis

The flowers counts look distinctly under-dispersed relative to the Poisson distribution. There appear to be two few small counts, so that the counts are right skew relative to the Poisson distribution:

> cyclamen <- na.omit(read.table("cyclamen.txt",header=T))
> cyclamen$Variety <- factor(cyclamen$Variety)
> cyclamen$Regimem <- factor(cyclamen$Regimem)
> cyclamen$Fertilizer <- factor(cyclamen$Fertilizer)
> attach(cyclamen)
> cyclamen <- na.omit(cyclamen)
> table(Day,Night)
    14  16  18  20  26
14 320   0   0 320 319
16   0 320   0   0   0
18   0   0 319   0   0
20   0   0   0 320   0
> tapply(Regimem,list(Day,Night),mean)
   14 16 18 20 26
14  1 NA NA  5  6
16 NA  2 NA NA NA
18 NA NA  3 NA NA
20 NA NA NA  4 NA
> table(Flowers)
  4   5   6   7   8   9  10  11  12 13 14 15 16 17 18 19 21 26
 16 107 233 324 348 259 245 153 107 57 29 22  9  3  1  3  1  1
> out <- glm(Flowers~(Variety+Regimem+Fertilizer)^2,family=poisson)
> anova(out,test="Chi")
Analysis of Deviance Table

Poisson model

Response: Flowers

Terms added sequentially (first to last)
                   Df Deviance Resid. Df Resid. Dev   Pr(Chi)
              NULL                  1917   1256.219
           Variety  3  26.6900      1914   1229.529 0.0000068
           Regimem  5 121.3986      1909   1108.131 0.0000000
        Fertilizer  3   1.8289      1906   1106.302 0.6086692
   Variety:Regimem 15  20.4549      1891   1085.847 0.1551692
Variety:Fertilizer  9   6.3119      1882   1079.535 0.7083428
Regimem:Fertilizer 15   4.6705      1867   1074.865 0.9945677
> out <- glm(Flowers~Variety+Regimem,family=poisson)
> r <- qres.pois(out)
> qqnorm(r,ylab="Quantile Residuals")
> abline(0,1)>

Can try to estimate a Tweedie power family variance. The MLE estimate for the variance power is 2.46 while the REML estimate is virtually the same at 2.47.

> m2loglik function(x) { dglm(Flowers ~ Variety + Regimem, family = tweedie(var.power = x, link.power = 0))$m2loglik } > nlmin(m2loglik,1) $x: [1] 2.462306 $converged: [1] T $conv.type: [1] "relative function convergence" > m2loglik(1) [1] 8496.399 > m2loglik(1.99999) [1] 8380.303 > m2loglik(2.46) [1] 8371.907 > m2loglik(3) [1] 8383.227 > coxreid function(x) { dglm(Flowers ~ Variety + Regimem, family = tweedie(var.power = x, link.power = 0), method = "reml")$m2loglik } > nlmin(coxreid,2.46) $x: [1] 2.465982 $converged: [1] T $conv.type: [1] "relative function convergence"

There is a dispersion effect for Variety. Varieties 3 and 4 have dispersion about 1/3 larger than the other two varieties.

> contrasts(Variety) <- contr.treatment(levels(Variety)) > contrasts(Regimem) <- contr.treatment(levels(Regimem)) > out <- dglm(Flowers~Variety+Regimem,~Variety,family=tweedie(var.power=2.47,link.power=0)) > summary(out) Call: dglm(formula = Flowers ~ Variety + Regimem, dformula = ~ Variety, family = twe edie( var.power = 2.47, link.power = 0)) Mean Coefficients: Value Std. Error t value (Intercept) 2.30746315559 0.01771684 130.241233799 Variety2 -0.09690147202 0.01569619 -6.173565921 Variety3 -0.00006442688 0.01700265 -0.003789225 Variety4 -0.04018161723 0.01688345 -2.379941069 Regimem2 -0.04336368856 0.02084727 -2.080065615 Regimem3 -0.16314064162 0.02057673 -7.928406427 Regimem4 -0.19195936859 0.02049633 -9.365549669 Regimem5 -0.07916040725 0.02076102 -3.812933519 Regimem6 -0.24318730741 0.02039299 -11.925042558 (Dispersion Parameters for Tweedie family estimated as below ) Scaled Null Deviance: 2180.945 on 1917 degrees of freedom Scaled Residual Deviance: 1918 on 1909 degrees of freedom Dispersion Coefficients: Value Std. Error t value (Intercept) -3.81004656 0.06454972 -59.0249875 Variety2 -0.04703883 0.09128709 -0.5152846 Variety3 0.21758630 0.09128709 2.3835385 Variety4 0.20655446 0.09138253 2.2603276 (Dispersion Parameter for Gamma family taken to be 2 ) Scaled Null Deviance: 2323.196 on 1917 degrees of freedom Scaled Residual Deviance: 2309.667 on 1914 degrees of freedom Minus Twice the Log-Likelihood: 8358.401 Number of Alternating Iterations: 3 > tapply(fitted(out$disp),Variety,mean) 1 2 3 4 0.02214715 0.02112949 0.02753051 0.02722847 > anova(out) Analysis of Deviance Table Tweedie double generalized linear model Response: Flowers DF Seq.Chisq P Adj.Chisq P1 Mean model 8 237.8387 0.000000000 244.6100 0.000000000 Dispersion model 3 13.5079 0.003657629 13.5079 0.003657629 > qqnorm(residuals(out)) > qqline(residuals(out))

> m2loglik function(x) { dglm(Flowers ~ Variety + Regimem, ~ Variety, family = tweedie(var.power = x, link.power = 0))$m2loglik } > nlmin(m2loglik,2.46) $x: [1] 2.441391 $converged: [1] T $conv.type: [1] "relative function convergence"

The following is the empirical lambda-gram for the data. The four values at lambda=100 are actually infinite. The data looks a lot like a Poisson with the first four values removed.


	Variable		Description

	Variety		Variety of cyclamen
	Regimem		Temperature regimen (combination of the temperature during the day and the temperature during the night)
	Day		Temperature during the day (Centigrade)
	Night		Temperature during the night
	Fertilizer		Level of fertilization
	Flowers		Number of flowers