The Journal of Mental Health Policy and Economics
Volume 5, Issue 1, 2002. Pages: 21-31

Published Online: 22 Jul 2002

A Comparison of Methods to Handle Skew Distributed Cost Variables in the Analysis of the Resource Consumption in Schizophrenia Treatment

Reinhold Kilian, 1* Herbert Matschinger,2 Walter Löffler,3 Christiane Roick4 and Matthias C. Angermeyer5

1Ph.D., University of Leipzig, Department of Psychiatry, Leipzig, Germany,
2Ph.D., University of Leipzig, Department of Psychiatry, Leipzig, Germany,
3Ph.D., University of Leipzig, Department of Psychiatry, Leipzig, Germany,
4MD, University of Leipzig, Department of Psychiatry, Leipzig, Germany,
5MD, University of Leipzig, Department of Psychiatry, Leipzig, Germany

*Correspondence to: Reinhold Kilian, Universität Leipzig, Klinik und Poliklinik für Psychiatrie,Johannissalle 20, D-04317 Leipzig, Germany
Tel.:        + 49-341-9724532
Fax:        + 49-341-9724539
E-mail: kilr@medizin.uni-leipzig.de

Source of Funding: The study was funded by grants from the German Federal Ministry of Education and Research in the framework of the research association public health Saxony (grant no. 01EG9732/7). The study was co-funded indirectly via the University of Leipzig (project no. 932 000-10) by H. Lundbeck A/S, Copenhagen (Study No. 96503).


Background: Transformation of the dependent cost variable is often used to solve the problems of heteroscedasticity and skewness in linear ordinary least square regression of health service cost data. However, transformation may cause difficulties in the interpretation of regression coefficients and the retransformation of predicted values.

Aims of the study: The study compares the advantages and disadvantages of different methods to estimate regression based cost functions using data on the annual costs of schizophrenia treatment.

Methods: Annual costs of psychiatric service use and clinical and socio-demographic characteristics of the patients were assessed for a sample of 254 patients with a diagnosis of schizophrenia (ICD-10 F 20.0) living in Leipzig. The clinical characteristics of the participants were assessed by means of the BPRS 4.0, the GAF, and the CAN for service needs. Quality of life was measured by WHOQOL-BREF. A linear OLS regression model with non-parametric standard errors, a log-transformed OLS model and a generalized linear model with a log-link and a gamma distribution were used to estimate service costs. For the estimation of robust non-parametric standard errors, the variance estimator by White and a bootstrap estimator based on 2000 replications were employed. Models were evaluated by the comparison of the R2 and the root mean squared error (RMSE). RMSE of the log-transformed OLS model was computed with three different methods of bias-correction. The 95% confidence intervals for the differences between the RMSE were computed by means of bootstrapping. A split-sample-cross-validation procedure was used to forecast the costs for the one half of the sample on the basis of a regression equation computed for the other half of the sample.

Results: All three methods showed significant positive influences of psychiatric symptoms and met psychiatric service needs on service costs. Only the log- transformed OLS model showed a significant negative impact of age, and only the GLM shows a significant negative influences of employment status and partnership on costs. All three models provided a R2 of about .31. The Residuals of the linear OLS model revealed significant deviances from normality and homoscedasticity. The residuals of the log-transformed model are normally distributed but still heteroscedastic. The linear OLS model provided the lowest prediction error and the best forecast of the dependent cost variable. The log-transformed model provided the lowest RMSE if the heteroscedastic bias correction was used. The RMSE of the GLM with a log link and a gamma distribution was higher than those of the linear OLS model and the log-transformed OLS model. The difference between the RMSE of the linear OLS model and that of the log-transformed OLS model without bias correction was significant at the 95% level. As result of the cross-validation procedure, the linear OLS model provided the lowest RMSE followed by the log-transformed OLS model with a heteroscedastic bias correction. The GLM showed the weakest model fit again. None of the differences between the RMSE resulting form the cross- validation procedure were found to be significant.

Discussion: The comparison of the fit indices of the different regression models revealed that the linear OLS model provided a better fit than the log-transformed model and the GLM, but the differences between the models’ RMSE were not significant. Due to the small number of cases in the study the lack of significance does not sufficiently proof that the differences between the RSME for the different models are zero and the superiority of the linear OLS model can not be generalized. The lack of significant differences among the alternative estimators may reflect a lack of sample size adequate to detect important differences among the estimators employed. Further studies with larger case number are necessary to confirm the results.

Implications: Specification of an adequate regression models requires a careful examination of the characteristics of the data. Estimation of standard errors and confidence intervals by nonparametric methods which are robust against deviations from the normal distribution and the homoscedasticity of residuals are suitable alternatives to the transformation of the skew distributed dependent variable. Further studies with more adequate case numbers are needed to confirm the results.


Received 30 October 2001; accepted 14 June 2002

