/ Home

OzDASL

Canadian Automobile Insurance Claims for 1957-1958

Keywords: Poisson regression, gamma regression, offset, Tweedie generalized linear model


Description

The data give the Canadian automobile insurance experience for policy years 1956 and 1957 as of June 30, 1959. The data includes virtually every insurance company operating in Canada and was collated by the Statistical Agency (Canadian Underwriters' Association - Statistical Department) acting under instructions from the Superintendent of Insurance. The data given here is for private passenger automobile liability for non-farmers for all of Canada excluding Saskatchewan.

The variable Merit measures the number of years since the last claim on the policy. The variable Class is a collation of age, sex, use and marital status. The variables Insured and Premium are two measures of the risk exposure of the insurance companies.


Variable Description

Merit Merit Rating:
3 - licensed and accident free 3 or more years
2 - licensed and accident free 2 years
1 - licensed and accident free 1 year
0 - all others
Class 1 - pleasure, no male operator under 25
2 - pleasure, non-principal male operator under 25
3 - business use
4 - unmarried owner or principal operator under 25
5 - married owner or principal operator under 25
Insured Earned car years
Premium Earned premium in 1000's
(adjusted to what the premium would have been had all cars been written at 01 rates)
Claims Number of claims
Cost Total cost of the claim in 1000's of dollars

Download

Data File (tab-delimited text file)

Source

Bailey, R. A., and Simon, LeRoy J. (1960). Two studies in automobile insurance ratemaking. ASTIN Bulletin, 192-217.

Analysis

One could apply Poisson regression to the number of claims and gamma regression to the cost per claim.

Alternatively the total cost could be modelled using a Tweedie generalized linear model.

Number of claims

> carinsca <- read.table("carinsca.txt",header=T)
> carinsca$Merit <- ordered(carinsca$Merit)
> carinsca$Class <- factor(carinsca$Class)
> options(contrasts=c("contr.treatment","contr.treatment"))
> attach(carinsca)
> out <- glm(Claims/Insured~Merit+Class,family="poisson")
> summary(out,cor=F)

Call: glm(formula = Claims/Insured ~ Merit + Class, family = "poisson", weights = Insured)
Deviance Residuals:
       Min        1Q    Median       3Q      Max
 -10.79274 -3.007873 -1.575749 2.426679 11.62523

Coefficients:
                 Value  Std. Error    t value
(Intercept) -2.0357359 0.004311305 -472.18556
     Merit1 -0.1377590 0.007172219  -19.20730
     Merit2 -0.2206796 0.007997189  -27.59465
     Merit3 -0.4929506 0.004502371 -109.48689
     Class2  0.2998302 0.007258049   41.31003
     Class3  0.4690550 0.005039141   93.08233
     Class4  0.5258551 0.005364533   98.02439
     Class5  0.2155504 0.010734511   20.08013

(Dispersion Parameter for Poisson family taken to be 1 )

    Null Deviance: 33854.16 on 19 degrees of freedom

Residual Deviance: 579.5163 on 12 degrees of freedom

Number of Fisher Scoring Iterations: 3

> tapply(residuals(out),list(Merit,Class),mean)
           1          2          3          4         5
0  10.255784 -6.0933143 -3.3741689 -10.792736 -2.775917
1   5.375534 -1.6555839 -2.6246195  -7.064295 -2.885774
2   0.885985 -0.6689448  0.7485641  -1.495913 -1.667528
3  -5.981040  3.8071118  2.3437884  11.625227  2.675351

There is a strong monotonic effect for Merit  and a strong effect for Class. Although the main effects are dominant, there is also very strong evidence for an interaction. Specifically it appears that the effect of Merit (the no-claim bonus) is greater for Class 1 than for the other classes.

> out <- glm(Claims~offset(log(Insured))+Merit+Class+Merit:(Class==1),family="poisson")
> anova(out,test="F")
Analysis of Deviance Table

Poisson model

Response: Claims

Terms added sequentially (first to last)
                   Df Deviance Resid. Df Resid. Dev  F Value        Pr(F)
              NULL                    19   33854.16
             Merit  3 17754.11        16   16100.05 646.8548 0.0000000001
             Class  4 15520.53        12     579.52 424.1074 0.0000000003
        Class == 1  0     0.00        12     579.52
Merit:(Class == 1)  3   497.44         9      82.07  18.1239 0.0003728092

Size of claims

> out <- glm(Cost/Claims~Merit+Class,family=Gamma(link="log"),weights=Claims)
> summary(out,cor=F)

Call: glm(formula = Cost/Claims ~ Merit + Class, family = Gamma(link = "log"),
	weights = Claims)
Deviance Residuals:
       Min        1Q     Median       3Q     Max
 -5.991365 -1.888093 -0.3277561 2.265134 6.32554

Coefficients:
                  Value Std. Error     t value
(Intercept) -1.17455633 0.01553958 -75.5848067
     Merit1 -0.06867160 0.02611050  -2.6300374
     Merit2 -0.07024798 0.02910711  -2.4134299
     Merit3 -0.05672978 0.01630784  -3.4786823
     Class2  0.08273552 0.02641340   3.1323310
     Class3  0.01583281 0.01833657   0.8634554
     Class4  0.15981475 0.01942488   8.2273208
     Class5 -0.08142413 0.03908080  -2.0834817

(Dispersion Parameter for Gamma family taken to be 13.25813 )

    Null Deviance: 1556.011 on 19 degrees of freedom

Residual Deviance: 156.9042 on 12 degrees of freedom

Number of Fisher Scoring Iterations: 3
> anova(out,test="F")
Analysis of Deviance Table

Gamma model

Response: Cost/Claims

Terms added sequentially (first to last)
      Df Deviance Resid. Df Resid. Dev  F Value       Pr(F)
 NULL                    19   1556.011
Merit  3  293.305        16   1262.706  7.37423 0.004634306
Class  4 1105.802        12    156.904 20.85139 0.000024729

 


Help

Home - About Us - Contact Us
Copyright © Gordon Smyth