In this paper, we propose a measure for detecting influential outliers in linear regression analysis. The performance of the proposed method, called the Coefficient of Determination Ratio (CDR), is then compared with some standard measures of influence, namely: Cook’s distance, studentised deleted residuals, leverage values, covariance ratio, and difference in fits standardized. Two existing datasets, one artificial and one real, are employed for the comparison and to illustrate the efficiency of the proposed measure. It is observed that the proposed measure appears more responsive to detecting influential outliers in both simple and multiple linear regression analyses. The CDR thus provides a useful alternative to existing methods for detecting outliers in structured datasets.
Published in | American Journal of Theoretical and Applied Statistics (Volume 3, Issue 4) |
DOI | 10.11648/j.ajtas.20140304.14 |
Page(s) | 100-106 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2014. Published by Science Publishing Group |
Coefficient of Determination Ratio, Cook’s Distance, DFFITS, CVR, Studentised Deleted Residuals, Leverage Values
[1] | Barnett, V., & Lewis, T. (1994). Outliers in statistical data (3rd ed.). New York, NY: John Wiley and Sons. |
[2] | Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). London, England: Lawrence Erlbaum Associates. |
[3] | Weisberg, S. (2005). Applied linear regression (3rd ed.). New York, NY: John Wiley and Sons. |
[4] | Nurunnabi, A. A. M., Imon, A. H. M. R., Ali, A. B. M. S., & Nasser, M. (2011). Outlier detection in linear regression. Retrieved June 9, 2011 from http://irma-international.org/chapter/outlier-detection-linear-regression/53318/ |
[5] | Chatterjee, S., & Hadi, A. S. (1988). Sensitivity analysis in linear regression. New York, NY: John Wiley & Sons. |
[6] | Cook, R. D. & Weisberg, S. (1982). Residuals and Influence in Regression. New York, NY: Chapman and Hall. |
[7] | Rencher, A. C. & Schaalje, G. B. (2008). Linear models in statistics (2nd ed.). New Jersey, NJ: John Wiley & Sons. |
[8] | Siniksaran, E. & Satman, M. H. (2011). PURO: A package for unmasking regression outliers. Gazi University Journal of Science, 24 (1), 59-68. |
[9] | Moore, J. (1975): Total biochemical oxygen demand of dairy manures. Ph. D. Thesis, Univ. of Minnesota, Dept. Agricultural Engineering. |
[10] | Chatterjee, S. & Hadi, A. S. (1986). Influential observations, high leverage points, and outliers in linear regression. Statistical Science, 1 (3), 379-393. |
APA Style
Arimiyaw Zakaria, Nathaniel Kwamina Howard, Bismark Kwao Nkansah. (2014). On the Detection of Influential Outliers in Linear Regression Analysis. American Journal of Theoretical and Applied Statistics, 3(4), 100-106. https://doi.org/10.11648/j.ajtas.20140304.14
ACS Style
Arimiyaw Zakaria; Nathaniel Kwamina Howard; Bismark Kwao Nkansah. On the Detection of Influential Outliers in Linear Regression Analysis. Am. J. Theor. Appl. Stat. 2014, 3(4), 100-106. doi: 10.11648/j.ajtas.20140304.14
AMA Style
Arimiyaw Zakaria, Nathaniel Kwamina Howard, Bismark Kwao Nkansah. On the Detection of Influential Outliers in Linear Regression Analysis. Am J Theor Appl Stat. 2014;3(4):100-106. doi: 10.11648/j.ajtas.20140304.14
@article{10.11648/j.ajtas.20140304.14, author = {Arimiyaw Zakaria and Nathaniel Kwamina Howard and Bismark Kwao Nkansah}, title = {On the Detection of Influential Outliers in Linear Regression Analysis}, journal = {American Journal of Theoretical and Applied Statistics}, volume = {3}, number = {4}, pages = {100-106}, doi = {10.11648/j.ajtas.20140304.14}, url = {https://doi.org/10.11648/j.ajtas.20140304.14}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20140304.14}, abstract = {In this paper, we propose a measure for detecting influential outliers in linear regression analysis. The performance of the proposed method, called the Coefficient of Determination Ratio (CDR), is then compared with some standard measures of influence, namely: Cook’s distance, studentised deleted residuals, leverage values, covariance ratio, and difference in fits standardized. Two existing datasets, one artificial and one real, are employed for the comparison and to illustrate the efficiency of the proposed measure. It is observed that the proposed measure appears more responsive to detecting influential outliers in both simple and multiple linear regression analyses. The CDR thus provides a useful alternative to existing methods for detecting outliers in structured datasets.}, year = {2014} }
TY - JOUR T1 - On the Detection of Influential Outliers in Linear Regression Analysis AU - Arimiyaw Zakaria AU - Nathaniel Kwamina Howard AU - Bismark Kwao Nkansah Y1 - 2014/07/30 PY - 2014 N1 - https://doi.org/10.11648/j.ajtas.20140304.14 DO - 10.11648/j.ajtas.20140304.14 T2 - American Journal of Theoretical and Applied Statistics JF - American Journal of Theoretical and Applied Statistics JO - American Journal of Theoretical and Applied Statistics SP - 100 EP - 106 PB - Science Publishing Group SN - 2326-9006 UR - https://doi.org/10.11648/j.ajtas.20140304.14 AB - In this paper, we propose a measure for detecting influential outliers in linear regression analysis. The performance of the proposed method, called the Coefficient of Determination Ratio (CDR), is then compared with some standard measures of influence, namely: Cook’s distance, studentised deleted residuals, leverage values, covariance ratio, and difference in fits standardized. Two existing datasets, one artificial and one real, are employed for the comparison and to illustrate the efficiency of the proposed measure. It is observed that the proposed measure appears more responsive to detecting influential outliers in both simple and multiple linear regression analyses. The CDR thus provides a useful alternative to existing methods for detecting outliers in structured datasets. VL - 3 IS - 4 ER -