Mendeteksi Outlier pada Data Multivariat dengan Metode Jarak Mahalanobis-Minimum Covariance Determinant (MMCD)
Main Article Content
Abstract
This study aims to detect outliers in multivariate data using the Mahalanobis-Minimum Covariance Determinant (MMCD) distance method and compare it with the classical Mahalanobis distance method. In the MMCD distance method, outlier detection is performed using the Minimum Covariance Determinant (MCD) as an estimator to determine the center of the data and the smallest covariance. Then, outlier detection is performed using the MMCD distance by replacing the center of the data with the median, which is believed to be robust to outliers. Furthermore, outlier detection is carried out using the classical Mahalanobis distance method - Arithmetic Mean, and the Mahalanobis distance method - Median. Data is identified as an outlier when its distance is greater than the predetermined cutoff value. Based on this research, it is found that using the same data, the MMCD distance method yields fewer outliers compared to the classical Mahalanobis distance method. The results of outlier detection using MMCD distance with mean and MCD-estimated covariance are 34 wines, and using MMCD distance with MCD-estimated covariance and median as the center of the data are 35 wines.
Article Details
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
Bhatt, V., Dhakar, M., & Chaurasia, BK. (2016). Filtered Clustering Based on Local Outlier Factor in Data Mining. International Journal of Database Theory and Application, 9(5), 275–282.
Filzmoser, P. (2004). A multivariate outlier detection method. Citeseer.
Hadi, A.S., Imon A.H.M.R., & Werner, M. (2009). Detection of outliers. Wiley Interdisciplinary Reviews: Computational Statistics, 1(1), 57–70.
Hawkins, D.M. (1980). Identification of Outliers. Chapman and Hall. London.
Hubert, M., & Debruyne, M. (2010). Minimum covariance determinant. Wiley Interdisciplinary Reviews: Computational Statistics, 2(1), 36–43.
Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate statistical analysis. 6th. New Jersey, US: Pearson Prentice Hall.
Li, X., et al. (2019). Outlier Detection Based on Robust Mahalanobis Distance and Its Application. Open Journal of Statistics, 09(01), 15–26. https://doi.org/10.4236/ojs.2019.91002
Mahalanobis, P. C. (1930). On Tests and Measures of Groups Divergence. Journal of Asiatic Sociology of Bengal, 26, 541–588.
Nurfauziyya, L., & Winarko, E. (2013). Time-series Clustering by Approximate Prototypes. International Journal of Computer Science and Information Security (Vol. 8). MIT Press.
Rencher, A.C. (2002). Methods of Multivariate Analysis. Canada: John Willey & Sons. Inc. Publications.
Rousseeuw, P. J., & Leroy, A. M. (1987). Robust regression and outlier detection. John Wiley & Sons.
Rousseeuw, P.J., Driessen, K.V. (1999). A fast Algorithm for the Minimum Covariance Determinant Estimator. Technometrics, 41(3), 212–223.
Rousseeuw, P.J., van Zomeren, B.C. (1990). Unmasking Multivariate Outliers and Leverage Points. Journal of the American Statistical Association, 85(411), 633–639.