/ Home

OzDASL

Number of Cyclamen Flowers

Keywords: underdispersion relative to Poisson distribution, factorial experiment, dispersion modelling.


Description

This data comes from an experiment on induction of flowering of cyclamen. Plants of 4 varieties of cyclamen were subject to a combination of 6 temperature regimens and 4 levels of fertilization. The temperature regimens are combinations of five temperatures during the day (14, 16, 18, 20 and 26 degrees C) and four temperatures during the night (14, 16, 18 and 20 C). Not all the combinations of temperatures are present. The response is the number of flowers, which vary from 4 to 26, with mode 8.


Variable Description

Variety Variety of cyclamen
Regimem Temperature regimen (combination of the temperature during the day and the temperature during the night)
Day Temperature during the day (Centigrade)
Night Temperature during the night
Fertilizer Level of fertilization
Flowers Number of flowers

Download

Data File (tab-delimited text)

Source

The data were supplied by Rodrigo Labouriau of the Biometrics Research Unit, Danish Institute for Agricultural Sciences.

Analysis

The flowers counts look distinctly under-dispersed relative to the Poisson distribution. There appear to be two few small counts, so that the counts are right skew relative to the Poisson distribution:

> cyclamen <- na.omit(read.table("cyclamen.txt",header=T))
> cyclamen$Variety <- factor(cyclamen$Variety)
> cyclamen$Regimem <- factor(cyclamen$Regimem)
> cyclamen$Fertilizer <- factor(cyclamen$Fertilizer)
> attach(cyclamen)
> cyclamen <- na.omit(cyclamen)
> table(Day,Night)
    14  16  18  20  26
14 320   0   0 320 319
16   0 320   0   0   0
18   0   0 319   0   0
20   0   0   0 320   0
> tapply(Regimem,list(Day,Night),mean)
   14 16 18 20 26
14  1 NA NA  5  6
16 NA  2 NA NA NA
18 NA NA  3 NA NA
20 NA NA NA  4 NA
> table(Flowers)
  4   5   6   7   8   9  10  11  12 13 14 15 16 17 18 19 21 26
 16 107 233 324 348 259 245 153 107 57 29 22  9  3  1  3  1  1
> out <- glm(Flowers~(Variety+Regimem+Fertilizer)^2,family=poisson)
> anova(out,test="Chi")
Analysis of Deviance Table

Poisson model

Response: Flowers

Terms added sequentially (first to last)
                   Df Deviance Resid. Df Resid. Dev   Pr(Chi)
              NULL                  1917   1256.219
           Variety  3  26.6900      1914   1229.529 0.0000068
           Regimem  5 121.3986      1909   1108.131 0.0000000
        Fertilizer  3   1.8289      1906   1106.302 0.6086692
   Variety:Regimem 15  20.4549      1891   1085.847 0.1551692
Variety:Fertilizer  9   6.3119      1882   1079.535 0.7083428
Regimem:Fertilizer 15   4.6705      1867   1074.865 0.9945677
> out <- glm(Flowers~Variety+Regimem,family=poisson)
> r <- qres.pois(out)
> qqnorm(r,ylab="Quantile Residuals")
> abline(0,1)>
 cyclam1.gif (4443 bytes)

Can try to estimate a Tweedie power family variance. The MLE estimate for the variance power is 2.46 while the REML estimate is virtually the same at 2.47.

> m2loglik
function(x)
{
	dglm(Flowers ~ Variety + Regimem, family = tweedie(var.power = x,
		link.power = 0))$m2loglik
}
> nlmin(m2loglik,1)
$x:
[1] 2.462306

$converged:
[1] T

$conv.type:
[1] "relative function convergence"

> m2loglik(1)
[1] 8496.399
> m2loglik(1.99999)
[1] 8380.303
> m2loglik(2.46)
[1] 8371.907
> m2loglik(3)
[1] 8383.227

> coxreid
function(x)
{
	dglm(Flowers ~ Variety + Regimem, family = tweedie(var.power = x, link.power
		 = 0), method = "reml")$m2loglik
}
> nlmin(coxreid,2.46)
$x:
[1] 2.465982

$converged:
[1] T

$conv.type:
[1] "relative function convergence"

There is a dispersion effect for Variety. Varieties 3 and 4 have dispersion about 1/3 larger than the other two varieties.

> contrasts(Variety) <- contr.treatment(levels(Variety))
> contrasts(Regimem) <- contr.treatment(levels(Regimem))
> out <- dglm(Flowers~Variety+Regimem,~Variety,family=tweedie(var.power=2.47,link.power=0))
> summary(out)

Call: dglm(formula = Flowers ~ Variety + Regimem, dformula =  ~ Variety, family = twe
edie(
	var.power = 2.47, link.power = 0))

Mean Coefficients:
                     Value Std. Error        t value
(Intercept)  2.30746315559 0.01771684  130.241233799
   Variety2 -0.09690147202 0.01569619   -6.173565921
   Variety3 -0.00006442688 0.01700265   -0.003789225
   Variety4 -0.04018161723 0.01688345   -2.379941069
   Regimem2 -0.04336368856 0.02084727   -2.080065615
   Regimem3 -0.16314064162 0.02057673   -7.928406427
   Regimem4 -0.19195936859 0.02049633   -9.365549669
   Regimem5 -0.07916040725 0.02076102   -3.812933519
   Regimem6 -0.24318730741 0.02039299  -11.925042558
(Dispersion Parameters for Tweedie family estimated as below )

    Scaled Null Deviance: 2180.945 on 1917 degrees of freedom
Scaled Residual Deviance: 1918 on 1909 degrees of freedom

Dispersion Coefficients:
                  Value Std. Error     t value
(Intercept) -3.81004656 0.06454972 -59.0249875
   Variety2 -0.04703883 0.09128709  -0.5152846
   Variety3  0.21758630 0.09128709   2.3835385
   Variety4  0.20655446 0.09138253   2.2603276
(Dispersion Parameter for Gamma family taken to be 2 )

    Scaled Null Deviance: 2323.196 on 1917 degrees of freedom
Scaled Residual Deviance: 2309.667 on 1914 degrees of freedom

Minus Twice the Log-Likelihood: 8358.401
Number of Alternating Iterations: 3

> tapply(fitted(out$disp),Variety,mean)
          1          2          3          4
 0.02214715 0.02112949 0.02753051 0.02722847

> anova(out)
Analysis of Deviance Table

Tweedie double generalized linear model

Response: Flowers

                 DF Seq.Chisq           P Adj.Chisq          P1
      Mean model  8  237.8387 0.000000000  244.6100 0.000000000
Dispersion model  3   13.5079 0.003657629   13.5079 0.003657629

> qqnorm(residuals(out))
> qqline(residuals(out))

cyclam2.gif (4356 bytes)

> m2loglik
function(x)
{
	dglm(Flowers ~ Variety + Regimem, ~ Variety,
	family = tweedie(var.power = x, link.power = 0))$m2loglik
}
> nlmin(m2loglik,2.46)
$x:
[1] 2.441391

$converged:
[1] T

$conv.type:
[1] "relative function convergence"

The following is the empirical lambda-gram for the data. The four values at lambda=100 are actually infinite. The data looks a lot like a Poisson with the first four values removed.

cyclam3.gif (3633 bytes)

 


Help

Home - About Us - Contact Us
Copyright © Gordon Smyth