Gordon Smyth Home: Research: Publications
Dunn, P. K., and Smyth, G. K. (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics 5, 236-244.
Peter K. Dunn and Gordon K. Smyth
Department of Mathematics, University of Queensland Brisbane, Q
4072, Australia.
In this paper we give a general definition of residuals for regression models with independent responses. Our definition produces residuals which are exactly normal, apart from sampling variability in the estimated parameters, by inverting the fitted distribution function for each response value and finding the equivalent standard normal quantile. Our definition includes some randomization to achieve continuous residuals when the response variable is discrete. Quantile residuals are easily computed in computer packages such as SAS, S-Plus, GLIM or LispStat, and allow residual analyses to be carried out in many commonly occurring situations in which the customary definitions of residuals fail. Quantile residuals are applied in this paper to three example data sets.
Keywords: deviance residual; exponential regression; generalized linear model; logistic regression; normal probability plot; Pearson residual.
Residuals, and especially plots of residuals, play a central role in the checking of statistical models. In normal linear regression the residuals are normally distributed and can be standardized to have equal variances. In non-normal regression situations, such as logistic regression or log-linear analysis, the residuals, as usually defined, may be so far from normality and from having equal variances as to be of no practical use. A particular problem occurs when the response variable is discrete and takes on a small number of distinct values, as for Poisson data with mean not far from zero or binomial data with mean close to either zero or the number of trials. In such situations the residuals lie on nearly parallel curves corresponding to distinct response values, and these spurious curves distract the eye seriously from any meaningful message that might be contained in a residual plot.
In this paper we give a general definition of residuals for regression models with independent responses. Our definition produces residuals which are exactly normal, apart from sampling variability in the estimated parameters, by inverting the fitted distribution function at each response value and finding the equivalent standard normal quantile. This approach is closely related to that of Cox and Snell (1968), but whereas Cox and Snell concentrate on mean and variance corrections we concentrate on the transformation to normality. Our definition includes some randomization to achieve continuous residuals when the response variable is discrete. Quantile residuals are easily computed in computer packages such as SAS, S-Plus, GLIM or LispStat, and allow residual analyses to be carried out in many commonly occurring situations in which the customary definitions of residuals fail.
Special cases of quantile residuals have been used by Brillinger and Preisler (1983) and Brillinger (1996). For other work on residuals for non-normal regression models see Pierce and Schafer (1986) or McCullagh and Nelder (1989) and the references therein. In the discussion at the end of the paper we briefly indicate how quantile residuals may be extended to models with dependent responses.