| / Home |
Keywords: Poisson regression, gamma regression, offset, Tweedie generalized linear model
The data give the Canadian automobile insurance experience for policy years 1956 and 1957 as of June 30, 1959. The data includes virtually every insurance company operating in Canada and was collated by the Statistical Agency (Canadian Underwriters' Association - Statistical Department) acting under instructions from the Superintendent of Insurance. The data given here is for private passenger automobile liability for non-farmers for all of Canada excluding Saskatchewan.
The variable Merit measures the number of years since the last claim on the policy. The variable Class is a collation of age, sex, use and marital status. The variables Insured and Premium are two measures of the risk exposure of the insurance companies.
| Variable | Description | ||
| Merit | Merit Rating: 3 - licensed and accident free 3 or more years 2 - licensed and accident free 2 years 1 - licensed and accident free 1 year 0 - all others |
||
| Class | 1 - pleasure, no male operator under 25 2 - pleasure, non-principal male operator under 25 3 - business use 4 - unmarried owner or principal operator under 25 5 - married owner or principal operator under 25 |
||
| Insured | Earned car years | ||
| Premium | Earned premium in 1000's (adjusted to what the premium would have been had all cars been written at 01 rates) |
||
| Claims | Number of claims | ||
| Cost | Total cost of the claim in 1000's of dollars | ||
Data File (tab-delimited text file)
| Bailey, R. A., and Simon, LeRoy J. (1960). Two studies in automobile insurance ratemaking. ASTIN Bulletin, 192-217. |
One could apply Poisson regression to the number of claims and gamma regression to the cost per claim.
Alternatively the total cost could be modelled using a Tweedie generalized linear model.
Number of claims
> carinsca <- read.table("carinsca.txt",header=T)
> carinsca$Merit <- ordered(carinsca$Merit)
> carinsca$Class <- factor(carinsca$Class)
> options(contrasts=c("contr.treatment","contr.treatment"))
> attach(carinsca)
> out <- glm(Claims/Insured~Merit+Class,family="poisson")
> summary(out,cor=F)
Call: glm(formula = Claims/Insured ~ Merit + Class, family = "poisson", weights = Insured)
Deviance Residuals:
Min 1Q Median 3Q Max
-10.79274 -3.007873 -1.575749 2.426679 11.62523
Coefficients:
Value Std. Error t value
(Intercept) -2.0357359 0.004311305 -472.18556
Merit1 -0.1377590 0.007172219 -19.20730
Merit2 -0.2206796 0.007997189 -27.59465
Merit3 -0.4929506 0.004502371 -109.48689
Class2 0.2998302 0.007258049 41.31003
Class3 0.4690550 0.005039141 93.08233
Class4 0.5258551 0.005364533 98.02439
Class5 0.2155504 0.010734511 20.08013
(Dispersion Parameter for Poisson family taken to be 1 )
Null Deviance: 33854.16 on 19 degrees of freedom
Residual Deviance: 579.5163 on 12 degrees of freedom
Number of Fisher Scoring Iterations: 3
> tapply(residuals(out),list(Merit,Class),mean)
1 2 3 4 5
0 10.255784 -6.0933143 -3.3741689 -10.792736 -2.775917
1 5.375534 -1.6555839 -2.6246195 -7.064295 -2.885774
2 0.885985 -0.6689448 0.7485641 -1.495913 -1.667528
3 -5.981040 3.8071118 2.3437884 11.625227 2.675351
There is a strong monotonic effect for Merit and a strong effect for Class. Although the main effects are dominant, there is also very strong evidence for an interaction. Specifically it appears that the effect of Merit (the no-claim bonus) is greater for Class 1 than for the other classes.
> out <- glm(Claims~offset(log(Insured))+Merit+Class+Merit:(Class==1),family="poisson")
> anova(out,test="F")
Analysis of Deviance Table
Poisson model
Response: Claims
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev F Value Pr(F)
NULL 19 33854.16
Merit 3 17754.11 16 16100.05 646.8548 0.0000000001
Class 4 15520.53 12 579.52 424.1074 0.0000000003
Class == 1 0 0.00 12 579.52
Merit:(Class == 1) 3 497.44 9 82.07 18.1239 0.0003728092
Size of claims
> out <- glm(Cost/Claims~Merit+Class,family=Gamma(link="log"),weights=Claims)
> summary(out,cor=F)
Call: glm(formula = Cost/Claims ~ Merit + Class, family = Gamma(link = "log"),
weights = Claims)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.991365 -1.888093 -0.3277561 2.265134 6.32554
Coefficients:
Value Std. Error t value
(Intercept) -1.17455633 0.01553958 -75.5848067
Merit1 -0.06867160 0.02611050 -2.6300374
Merit2 -0.07024798 0.02910711 -2.4134299
Merit3 -0.05672978 0.01630784 -3.4786823
Class2 0.08273552 0.02641340 3.1323310
Class3 0.01583281 0.01833657 0.8634554
Class4 0.15981475 0.01942488 8.2273208
Class5 -0.08142413 0.03908080 -2.0834817
(Dispersion Parameter for Gamma family taken to be 13.25813 )
Null Deviance: 1556.011 on 19 degrees of freedom
Residual Deviance: 156.9042 on 12 degrees of freedom
Number of Fisher Scoring Iterations: 3
> anova(out,test="F")
Analysis of Deviance Table
Gamma model
Response: Cost/Claims
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev F Value Pr(F)
NULL 19 1556.011
Merit 3 293.305 16 1262.706 7.37423 0.004634306
Class 4 1105.802 12 156.904 20.85139 0.000024729
|
Home - About Us -
Contact Us Copyright © Gordon Smyth |